Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Backend API restructure: pagination, refactoring, and plans for Gitlab & BitBucket #1134

Closed
Benaiah opened this issue Feb 23, 2018 · 6 comments

Comments

@Benaiah
Copy link
Contributor

Benaiah commented Feb 23, 2018

Summary

This is a design proposal for our backend API - a goal of the 2.0 release. It is also an update on the immediate plans for implementing pagination for the GitLab API. This is an architecturally complex feature, and the current code needs significant refactoring to support more pagination paradigms, such as the links used by the GitLab API or the JSON wrapper used by BitBucket. Developing this feature should therefore be done with thought towards our future API. I've had extensive discussions with other maintainers and contributors about improvements we want out of this process - these have been a major influence on the design I'm proposing, so I encourage continuing that process with further critique of the suggestions here. While I've written things in an imperative style ("the best way to do this is foo", "we should do bar"), that's just for the sake of brevity. This is all currently in flux, so now is the ideal time to bring up as many concerns as you can think of.

The immediate goals are listed at the end - the proposed design is not intended to be implemented top-to-bottom all at once. It is important to consider, though, since the entire entry-listing process already requires refactoring to support pagination.

Reasons for restructuring

The current pagination code is heavily coupled to a few assumptions which prevent using most REST APIs idiomatically. These include the following:

TODO: link to the code assuming these conditions

  • Any given page can be loaded immediately, given an integer index as the only argument.
  • The page contents are static and new pages can simply be appended to an existing list.
  • The user simply loads one page after another from the beginning to the end (which invalidates any use of the earlier requirement of being able to load any arbitrary page by its index).

The reducer that manages store.entries.pages maintains a single Immutable.js List of ids for each collection, which is simply concatenated to as new pages are loaded. It's the minimum possible implementation for the infinite-scroll pagination used by the Algolia integration. However, it is incompatible with the way most third-party APIs implement pagination, as well as preventing us from loading arbitrary pages and having their order make sense.

Usually, APIs return - either in the JSON body or in headers - a set of links which allow for navigation to given pages relative to the current one. In GitLab, for instance, you can navigate to the next, previous, first, or last page via links in specific headers, but the docs specifically warn against trying to construct URLs based on indexes. BitBucket, on the other hand, uses a wrapper with pagination information around the values returned by its API, and only provides previous and next (and previous isn't guaranteed).

Additionally, pages can change between requests as entries are added or removed, and resolving arbitrary changes in pagination (the sources of which may be well outside the current page) on top of sorting things.

Finally, there are significant improvements we've already discussed which can be leveraged to make dealing with pagination easier.

Current backend API

Expand...

Init and auth lifecycle

Expand...

The CMS starts its lifecycle by rendering the App component, which dispatches loadConfig when it mounts:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/components/App/App.js#L60-L62

When the config finishes loading in loadConfig, it dispatches authenticateUser:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/config.js#L102-L105

authenticateUser dispatches authenticating (AUTH_REQUEST), then calls backend.currentUser:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/auth.js#L46-L51

backend.currentUser checks this.user and this.authStore for existing user information, then passes that to this.implementation.restoreUser and resets the authStore user to that, returning null if there is no user info:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/backends/backend.js#L95-L107

When the App component is rendering and the user is null, it renders the return value of this.authenticating():

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/components/App/App.js#L121-L123

this.authenticating gets auth (originally from the redux store) as a prop and creates a backend with currentBackend:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/components/App/App.js#L68-L70

currentBackend returns a function that takes a config and returns a memoized backend with resolveBackend, which constructs and returns a backend based on the config:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/backends/backend.js#L358-L367

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/backends/backend.js#L343-L356

Next, App.authenticating renders backend.authComponent():

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/components/App/App.js#L76-L89

It is passed the following props:

  • onLogin: bound version of the App.handleLogin(credentials) method

  • error: the last auth error (during init this should be null)

  • isFetching: whether the authentication is fetching (during init this is null)

  • base_url: backend.base_url from the config.yml.

  • siteId: backend.site_domain from the config.yml.

(The names of the siteId and base_url props should be made consistent - they will be a public face of our backend API. We may want to rethink how we handle backend configuration as well.)

The component returned from backend.authComponent calls its onLogin prop with a credentials argument (e.g., the GitHub backend). This is passed to App.handleLogin, which dispatches loginUser with the same argument:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/components/App/App.js#L64-L66

loginUser dispatches an authenticating (AUTH_REQUEST), then calls backend.authenticate:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/auth.js#L66-L72

backend.authenticate calls this.implementation.authenticate to get a user, adds a backendName property, caches it, and finally resolves to it:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/backends/backend.js#L113-L119

In current implementations, this sets properties on the backend instance itself (this.token, for instance) and initializes the backend API object (these are not technically public interfaces, but they are patterns used by all current backends). (e.g., GitHub backend.authenticate).

Next, loginUser dispatches authenticate (AUTH_SUCCESS), or, on error, dispatches a notification action and then an authError (AUTH_FAILURE):

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/auth.js#L72-L84

The AUTH_SUCCESS action is received by the auth reducer, which resets the stored auth information in Redux to Map({ user }).

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/reducers/auth.js#L4-L19

Methods

Expand...

The methods a backend implementation is required to have are a mixed bag of synchronous, asynchronous, and some which are called with a Promise.resolve wrapper, so a synchronous return works despite the fact that the rest of the code sees that operation as asynchronous.

TODO: expand on these methods

Authentication

  1. sync authComponent()

    Returns a component which presents an interface to login to the CMS. This is usually either a login form or a button to trigger a login popup.

  2. async authenticate(credentials)

    Calls the backend with the credentials returned from the authComponent. Returns a user object which is stored in Redux (see auth lifecycles above).

  3. sync getToken()

    This is used solely to pass a token to integrations, and is called synchronously.

  4. sync/async logout()

  5. sync/async restoreUser(storedUser)

Published entries

  1. async entriesByFiles(collection, extension)

    This only receives the extension argument because of the dynamic dispatch used for this and entriesByFolder.

  2. async entriesByFolder(collection, extension)

    Note that the existing pagination API is not currently exposed to backends (this would be the most commonly paginated endpoint), as it's only used by an integration.

  3. async getEntry(collection, slug, path)

    The path argument to this strikes me as odd - shouldn't the backend be responsible for resolving a collection/slug pair to a path? This seems to couple the API needlessly to a filesystem-based backend, or at least require other backends to emulate being based on a filesystem.

  4. async persistEntry(entryObj, options)

    TODO: figure out the arguments to this function

    TODO: figure out if the collection object is consistent across calls

Media library

  1. async persistMedia(file, options)

  2. async getMedia()

  3. async deleteMedia(path)

Editorial workflow

  1. async unpublishedEntries()

  2. async unpublishedEntry(collection, slug)

  3. async updateUnpublishedEntryStatus(collection, slug, newStatus)

  4. async publishUnpublishedEntry(collection, slug)

  5. async deleteUnpublishedEntry(collection, slug)

No clear category

  1. async deleteFile(path, commitMessage)

    This method couples the backend interface not only to fs-like storage, but also to GitHub's particular implementation of this API method. This call should be reworked to be in terms of a collection and entry (or just a collection in the case of a single-file collection, if we end up implementing those). It is also confusingly similar to deleteUnpublishedEntry.

Redux store design

Expand...
const store = {
  collections: {
    [collectionName]: {
      // collection data
    },
    // ...
  },
  entries: {
    entities: {
      [collectionName].[postID]: {
        // entry data
      }
    },
    pages: {
      [collectionName]: {
        isFetching: false, // is the page currently being fetched
        ids: [
          // list of post ids - this is concatenated to every time a
          // new page is loaded, regardless of what page it is. This
          // thus ends up being just a full list of the posts in
          // whatever order they were loaded.
        ],
      }
    }
  },
}

New backend design

Expand...

Goals

  • Statelessness.
  • Should not tie backends to a specific storage implementation.
  • Pagination support should not significantly complicate development of non-paginated backends.

Backends as stateless functions

Expand...

Instead of storing auth, token, and configuration information in the state of an object, we can pass the required state into the backend when calling it, along with functions that allow the backend to store the state it needs. This state can then be handled by the core code however we want - this would allow us to more easily debug and test backend functions, including the eventual possibility of a backend-agnostic test suite we can provide to assist backend authors. All of that is much harder to do reliably if our API requires the backends to store state themselves.

We can do this by passing a context object as an argument containing the value state and the asynchronous function setState. setState triggers a Redux action which stores a new backend state in the primary Redux store. This allows us to control backend state by changing the arguments we pass in - a huge benefit for both testing backends and using them in unanticipated ways.

Is there a use-case for changing state to an async function getState?

Additionally, this would make creating a backend as simple as exporting functions and a capabilities object (perhaps some other metadata) from a JS module. The imported object could be passed directly to CMS.registerBackend.

Backend state storage

Expand...

Since backends will be implemented by arbitrary third parties, we can't allow the core code to depend on any part of the redux structure that's specific to a given backend. However, we also want to move to a stateless backend API which allows state to be stored in the Redux store, which allows us to leverage Redux state tracking. We also don't want to store functions in the Redux store, since they can't be serialized - this rules out hiding state with closures, as well as using a Symbol for a unique backend key.

I believe the best way to do this is to create a separate tree in the redux store specifically for backends. It should be considered a bug for any of the core code to rely on the data within this structure directly - if there is any way to enforce that, we should do so. It should be solely for the purposes of storing backend state, which should be considered opaque to the core CMS application. This allows us to store data in Redux without coupling the core code to a specific backend storage layout.

One important restriction on what backends should set as state is that it should remain serializable, to preserve the benefits of being able to serialize our entire Redux store.

Credential elision with stateless backends

One notable concern with saving Redux state is that we don't want to risk exposing a user's credentials. While we don't currently serialize the Redux store at all, we should keep in mind that we'll need some sort of credential elision if we decide to add an export of the Redux state for things like bug reports.

Allowing backends to use Redux actions/reducers

While I think it would be a mistake to require the use of Redux for backend state, I think it would be useful to allow backends to use the actions/reducers pattern for managing their own state.

An easy way to implement this would be to create a default set of action and reducer that simply implement the ability to set the new backend state completely. Backends that wish to override the default backend reducer could simply export their own reducer, which will be used if it exists. This would make Redux part of the public API, albeit a small one.

Redux structure

const store = {
  // TODO
}

Configuration

TODO

Init and auth

TODO

Functions

TODO: determine function signatures - these will be mostly the same with the addition of the context argument and a few refactors.

Capabilities

Expand...

API

A capabilities API for backends would allow them to declare what featureset they support. For instance, GitLab's pagination capabilities might look like the following:

{
  pagination: ["next", "previous", "first", "last"];
};

Since the plan is to use capabilities only for pagination to begin with, GitHub's capabilities would start out as simply an empty object.

The UI would receive the current capabilities as a prop, and could change UI based on what actions are available.

Pagination

Initially, capabilities will be implemented to handle pagination.

  • Get page size
  • Get number of pages
  • Go to next page
  • Go to previous page
  • Go to first page
  • Go to last page
  • Seek to page number

None of these capabilities can be assumed - BitBucket, for instance, only guarantees that the actual values and the link to the next page will exist.

Search

  • Ties into integration conversation - could a search integration be modeled as a wrapper for any other backend that adds a search backend and the search capability?
  • Should search and relations widget functionality be the same capability or distinct ones? (example use case: an API with search-by-document but no ability to narrow by field)

Media library

  • Do getting and setting media need to be represented distinctly? (example use case: a Git host that doesn't serve image mimetypes correctly)
  • Should there be two APIs specifically for media (the backend media API and the integrations media API?)

Pagination

Expand...

Features impacted by pagination

I summarized this for @talves in Gitter as follows:

If you think of all the entries of a collection as an array, then you can think of almost all the CMS features as being functions that could be passed to either entries.map(...) or entries.reduce(...). The ones that are essentially entries.map(...) functions (like editing a post, or displaying the metadata for a post) are really simple, and won't be affected by pagination. The ones that are essentially entries.reduce(...) functions (like the relations widget, search, collection sorting, tag list generation, etc.), unless they're supported by a specific backend API (like a search API, for instance), are all affected by pagination, because they all fundamentally require checking every single entry in order to return a result. We either have to find a workaround (like a specific backend API, or the Algolia integration) or we have to load every entry in the CMS itself. The only reason any of those features work right now is because nobody we know of except Smashing has enough entries in a collection for all this to matter.

The impact of this is that backends will need some way of communicating their capabilities to the core code, since feature support differences between backends can be small and fine-grained. The "crash on unsupported features" approach we use for backends without editorial workflow support won't work for more subtle differences like "which pagination links are available".

API

The core application will represent pagination state as an opaque cursor. The cursor is simple a value used to pass pagination context to the backend, but the core code should never look into or examine the cursor's contents. Cursors should be serializable, so they can be stored in Redux.

The cursor lifecycle will look something like the following:

  • When a paginated backend endpoint is called, the backend creates and returns a new cursor alongside the results.

  • The core code stores the cursor object as part of the current view state.

  • When the UI is rendered, it is passed function props that allow it to perform backend pagination. These should close around the cursor and pass it to the action creator so that the UI itself never has direct access to the cursor. It is also passed the static backend capabilities.

  • The UI decides which actions to show based on the backend's capabilities.

  • When the user performs a pagination action, the UI calls the corresponding function prop to create and dispatch that action (the cursor is passed along by calling the function prop, which closed over it earlier).

  • The action creator calls backend.cursor.<ACTION>(cursor) with the cursor, and receives the results and a new cursor.

TODO: how do we handle the count, size, and index?

Page Caching

  1. What do we currently cache?

    We currently treat pagination as if it is a static, lazy list of entries. That is, we don't require fetching all pages up-front, but we don't ever invalidate the pagination of the current request.

  2. Should we cache?

    Since we're leveraging Git-based APIs, I believe that we should cache collection contents as much as possible. Git helps us here by providing SHAs which can be used to tell when the folder underlying a collection has changed.

  3. Where should caching be done, architecturally?

    Since the CMS could conceivably used with APIs that either cannot or do not provide immutable keys to data, we will want different caching strategies for different backends. Caching should therefore be done in the backends themselves, and be transparent to the core code.

    Caching goes quite a ways past pagination - backends are also responsible for caching auth information, entry contents, etc. Doing page caching in the backend fits our current architecture best.

  4. When can page data be safely cached?

    When it refers to an immutable collection with a unique identity. For instance, if a Git-based API provides SHAs of folders, we can safely cache the pages of that folder, since if the folder changes its SHA will also change. This can be leveraged to reduce bandwidth usage even further if the API supports HEAD requests to get the SHA of an directory or file without the body.

    For instance, a page-cache system for a paginated folder using such an API could work like this:

    • When initially requesting the first loaded page, store the SHA of the currently paginated folder.
    • On loading any page, store
    • On pagination, make a HEAD request to the currently paginated folder, and retrieve the SHA.

UI

  1. Infinite scrolling

    Infinite scrolling should not be specified as a capability of its own, since it is simply an implementation of various other pagination capabilities (next/previous being the minimum required for 2-way infinite scrolling).

  2. Standard page navigation

    It may be simpler to drop infinite scrolling entirely in favor of explicit paging. It's extremely easy to drop features dynamically with standard paging - you just show a different set of buttons to go to other pages.

Semantic changes

Expand...

These changes are goals for the new API, but can't be implemented as a simple wrapper without changes to the core code:

  • Backends should be able to trigger display of the authentication dialog.

  • If we pursue composable backends, we'll want to support multiple backends by allowing the default backend to be overridden for specific collections.

  • Composable backends can implement several features which are currently implemented separately:

    • Integrations - the current integrations system could be replaced by composing backends, overriding the desired functionality in a much clearer and more direct way.
    • Media library - if we support multiple backends, the media library could be implemented as a collection which
  • Ideally, we'll remove ~"files"~ and ~"folder"~ collections in favor of single-entry and multiple-entry collections, leaving concerns such as paths to be handled by backend-specific configuration.

Examples

TODO

Integrations

The integrations API is also in need of refactoring or elimination (of the API, not the functionality it's implementing). The writeup on that is now located here: #1171

Immediate Goals

All this long-term API design has a short-term purpose: it directly impacts how we'll want to implement pagination for the GitLab API.

Expand...

Capabilities API for GitLab/BitBucket pagination

For purposes of GitLab and BitBucket support, the capabilities API will only be used for pagination to begin with. Other backends will be modified to return an empty object for capabilities. The capabilities will be passed into the collections pages to allow pagination support, but a comprehensive capabilities system is beyond the scope of the immediate pagination changes.

Cursor implementation

Since the current pagination implementation cannot be neatly refactored to work with the paginated GitLab or Bitbucket APIs, the cursor API will be implemented immediately. Existing backends should need minimal or no changes in order to use the cursor API - they'll simply export no pagination capabilities, return any value whatsoever for the cursor (since it won't actually be used), and continue returning all entries when requesting the

BitBucket

BitBucket will also be updated to use the new capabilities API for pagination. Merging may require some cleanup of the auth code as well, to implement BitBucket's refresh token, but a new auth workflow won't be implemented until this is merged.

New API methods

These will not be implemented by the GitLab or BitBucket backends for now. See the longer-term migration strategy below for more details.

Longer-term migration strategy

Once we've got pagination working and GitLab/BitBucket support merged into master, we'll want to start the work of moving to the new, stateless API (assuming we've decided to move forward with this).

Expand...

The best way to migrate to this backend API would be to build an object which wraps a backend written to support the new, stateless API, and implements the stateful lifecycle expected by the current API. The object wrapper will likely be quite simple, as it's just providing the state management without any of the redux details. Once this is done, we'll want to develop a generic backend test suite that can be run on any backend (perhaps with some backend-specific setup code for reproducible tests). This will allow several things:

  • We don't have to change any existing backends until we've proven the new API can work.
  • We're not locking in the shape of the new API.
  • If we decide to change what this API looks like, existing backends using it will continue to work.
  • As we develop new backends against this API, we'll refine the test suite we're using and it will become more robust, allowing us to port more heavily-used backends with greater confidence.

This won't immediately provide the benefits of the stateless backend structure, such as more robust Redux playback, but we've lived without that so far. It's important to prove that the new design is both robust and usable before we try to move heavily-used, battle-hardened code like the GitHub backend to it.

TODO: expand

@tech4him1
Copy link
Contributor

I've detailed the existing backend methods in this Gist: https://gist.github.com/tech4him1/9972cf4465e88e7f4e8cc29e6019e8c6.

@lafiosca
Copy link

I've been attempting to implement an S3-based custom backend, using the reference material from here, @talves, and @tech4him1, and continually digging through netlify-cms-core, netlify-cms-backend-github, and netlify-cms-backend-test. I'm still lacking a lot of clarity, and there seem to be inconsistencies and asymmetries with the existing methods and documentation. I would be interested in participating in discussions about the backend API and how it will change.

@snoopdouglas
Copy link

Re @lafiosca's efforts: I'm not going to be fully comfortable with using Netlify CMS until it uploads to bog-standard object storage, so to me, this is important.

Has anyone gotten any further than this?

@lafiosca
Copy link

lafiosca commented Jul 8, 2019

@snoopdouglas I apologize that my spare time for this effort has been reappropriated. I have not been keeping up with the current state of netlify-cms, but anyone else should feel welcome to pick up the torch and run with it as I will not likely get back to it.

@stale
Copy link

stale bot commented Oct 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@erezrokah
Copy link
Contributor

Closing this as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants