Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from callbacks to async/await #3559

Merged
merged 58 commits into from
Mar 7, 2019
Merged

Migrate from callbacks to async/await #3559

merged 58 commits into from
Mar 7, 2019

Conversation

muxator
Copy link
Contributor

@muxator muxator commented Feb 21, 2019

This PR represents the line of work that refactors Etherpad to use async/await calls instead of callbacks, vastly simplifying the code, as per #3540.

This represents a vast change, and by its nature requires a lot of testing, hence its WIP status. It is currently deployed on https://beta.etherpad.org. It will be merged as soon as deemed sufficiently stable.

It has the priority over other work: present and future PRs, in case of conflicting changes, will need to rebase themselves over this work.

The development is by contributor @raybellis.
The branch will be periodically force-pushed if needed, please update often.

Since this code can end up loaded in browsers when using client side plugins,
avoid use of ES6 syntax features such as arrow functions until MSIE support is
finally dropped.
…tall(), uninstall()

We cannot use arrow functions in this file, because code in /src/static can end
up being loaded in browsers, and we still support IE11.
Also fixed a bug where the system would make a request to the central server for
the plugin list for every search even if the list was already cached.
Use real `async` instead of async.js where applicable.
The `getPluginTests()` function was never truly async anyway because it only
contains calls to synchronous `fs` modules.
`getPadAccess()` (src/node/padaccess.js) is now "promise only", resolving to
`true` or `false` as appropriate, and throwing an exception if there's an
error.

The two call sites (padreadonly.js and importexport.js) updated to match.
Promisified methods:
  - get()
  - set()
  - findKeys()
  - getSub()
  - setSub()
  - remove()
  - doShutdown()
Removed the 's' for consistency with the other `doesFooExist()` manager calls.
Retained an alias for plugins that might be using it.
Removed the 's' for consistency with the other `doesFooExist()` manager calls.
Retained an alias for plugins that might be using it.
The function is now iterative rather than recursive.
Converted those functions that API.js still depends on, and others that at this
point are never called via the nodeback mechanism.
This patch also contains significant refactoring relating to error checking of
arguments supplied to the functions (e.g. rev) facilitated by use of `throw`
instead of nodeback errors.
Also converted the handler functions that depend on checkAccess() into async
functions too.

NB: this commit needs specific attention to it because it touches a lot of
security related code!
also fixes a missing await calling `.createAuthor` in db/Pad.js
- removed possible issue with failing to sanitize `padName` if `padId` was also
  supplied
- removed unnecessary `try` block
- simplified API and function name matching tests
raybellis and others added 13 commits January 31, 2019 11:14
This change is only cosmetic. Its aim is do make it easier to understand the
async changes that are going to be merged later on. It was extracted from the
original work from Ray Bellis.

To verify that nothing has changed, you can run the following command on each
file touched by this commit:
  npm install uglify-es
  diff --unified <(uglify-js --beautify bracketize <BEFORE.js>) <(uglify-js --beautify bracketize <AFTER.js>)



This is a complete script that does the same automatically (works from a
mercurial clone):

```bash
#!/usr/bin/env bash

set -eu

REVISION=<THIS_REVISION>

PARENT_REV=$(hg identify --rev "${REVISION}" --template '{p1rev}')
FILE_LIST=$(hg status --no-status --change ${REVISION})
UGLIFYJS="node_modules/uglify-es/bin/uglifyjs"

for FILE_NAME in ${FILE_LIST[@]}; do
  echo "Checking ${FILE_NAME}"
  diff --unified \
    <("${UGLIFYJS}" --beautify bracketize <(hg cat --rev "${PARENT_REV}" "${FILE_NAME}")) \
    <("${UGLIFYJS}" --beautify bracketize <(hg cat --rev "${REVISION}"   "${FILE_NAME}"))
done
```
Changed two occurrences of:

  process.nextTick(function() {
     if (fn) fn();
  });

with

  if (fn) {
    process.nextTick(fn);
  }

i.e. such that no function even gets added to the `nextTick` queue unless
there's actually a function to be called.

Extracted from Ray's work.
…clear() and length()

1. This module was not migrated to Promises, because it is only used via
   express-session, which can't actually use promises anyway.

2. all(), clear() and length() depend on the presence of the `db.forEach()`
   function, which in ueberdb2 doesn't even exist.

   Fortunately those three methods are optional, so I made their existence
   conditional on the presence of `db.forEach`.

3. in SessionStore.clear(), replaced a call to db.db.remove() with db.remove()
PadManager.sanitizePadId() can't use thenify: single arg callback
This change extracts the grammar correction performed on the async branch,
anticipating them in a single commit. It cannot be folded with the previous
one, as it is not purely cosmetic.
This change is in preparation of the future async refactoring by Ray. It tries
to extract as many changes in boolean conditions as possible, in order to make
more evident identifying eventual logic bugs in the future work.

This proved already useful in at least one case.

BEWARE: this commit exposes an incoherency in the DB API, in which, depending
on the driver used, some functions can return null or undefined. This condition
will be externally fixed by the final commit in this series ("db/DB.js: prevent
DB layer from returning undefined"). Until that commit, the code base may have
some bugs.
@muxator muxator added the async-migration Migration from callback-style programming to async functions label Mar 3, 2019
@muxator
Copy link
Contributor Author

muxator commented Mar 3, 2019

Hi, the PR is almost ready to be integrated. I just need a review from @ray.

While writing async, some boolean expressions where made stricter (e.g. == -> ===, != -> !==). This is generally a very good thing, but on a code base with lots of technical debt like ours, this can resurface latent bugs (this happened at least once).

  1. For this reason, I have tried to extract & centralize as many such changes as possible in a single commit, namely d520e28. These changes were all present in Ray's original work.

  2. While doing this, I noticed that in some files this transformation was not symmetrical (some places were made stricter, others were not). I've done another commit (8ecd0dd) which tentatively makes the situation even. None of these changes were part of Ray's work, yet I think we could integrate them.

I need a review on points 1 and 2. If Ray recognizes 1 as his beloved brainchild, and thinks that 2 may complete it, I'll fold the two changes and merge the PR.

@raybellis
Copy link
Contributor

I think your author === null check may result in a similar to the bug that I caused inadvertently with the SessionManager.

If you like, I'm happy to audit all calls to db.get* and ensure that they work with either undefined or null values, given that we can't be sure which will be returned by ueberDB.

I was going to suggest that just testing for if (!value) would suffice, but on reflection that may cause issues if the result value can legally be 0 or "", so an explicit check for val === null || val === undefined may be preferred. The alternative is to explicitly require that loose checks are annotated to indicate that they are intentionally loose.

@muxator
Copy link
Contributor Author

muxator commented Mar 4, 2019

That's true: using author === null could introduce regressions.

However, in its current status on async (async-PR closely tracks it) AuthorManager.js has only strict checks (X !== null or X === null) except for the one in mapAuthorWithDBKey().

The main pain point is probably ueberDB, but I do not want to digress.

On commenting lax comparison: in async-PR, I agree, indeed I had already put such a comment in SessionManager.js:

/*
   * In some cases, the db layer could return "undefined" as well as "null".
   * Thus, it is not possible to perform strict null checks on group2sessions.
   * In a previous version of this code, a strict check broke session
   * management.
   *
   * See: https://github.com/ether/etherpad-lite/issues/3567#issuecomment-468613960
   */
if (!group2sessions || !group2sessions.sessionIDs) {

I think we have to decide what to do with AuthorManager, as well. All the original comparisons were lax, but if you have a look at d520e28 (which, I remember, just extracts changes made on async), all but one were made strict. Could this introduce latent bugs, as well?

If you like, I'm happy to audit all calls to db.get* and ensure that they work with either undefined or null values, given that we can't be sure which will be returned by ueberDB.

That would be welcome: in that case we could reach a more coherent situation.

If you do not have time, or if we want to postpone this, I can also make a ugly proposal in the name of pragmatism:

  • put lax comparisons wherever the old code base was already lax. This would bring us nearer to bug-for-bug compatibility
  • put comments explaining our reasoning
  • merge the PR so we can start moving forward

If you think it makes sense, I can compare develop with the asynchronous version, and worsen its purity in the name of compatibility.

@raybellis
Copy link
Contributor

I've got time to do that audit. How (and where) would you prefer to see any relevant commits made?

@raybellis
Copy link
Contributor

For reference, here's a comparison of ueberDB2's behaviour with dirty vs mysql:

DirtyDB

get undefined (random) key
-> undefined
set explicit value and verify
-> pass
set value to undefined
-> undefined
set value to null
-> undefined

MySQL

get undefined (random) key
-> null
set explicit value and verify
-> pass
set value to undefined and get
-> undefined
set value to null and get
-> null

ueberDB2 can return either undefined or null for a missing key, depending on
which DB driver is used. This patch changes the promise version of the API so
that it will always return null.
@raybellis
Copy link
Contributor

I found an alternate solution - see 32cebc4

@muxator
Copy link
Contributor Author

muxator commented Mar 7, 2019

Nice. Pulled in with ac7663c.

@muxator muxator marked this pull request as ready for review March 7, 2019 00:36
@muxator muxator merged commit 4c45ac3 into ether:develop Mar 7, 2019
@muxator
Copy link
Contributor Author

muxator commented Mar 7, 2019

It's done.

I think it's worth citing from the commit message:

Merge pull request #3559 from raybellis/async-PR

With this commit, that closes #3540, we pay the first big slice of our technical debt. In this line of work we streamlined the code base, reducing its size by 15-20% and making it more understandable at the same time.

The changes were audited and tested collaboratively and are deemed sufficiently stable for being merged.

Known issues:

  • plugin compatibility is still not perfect
  • the error handling path needs to be improved

This is an important day for Etherpad: thanks, Ray!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async-migration Migration from callback-style programming to async functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants