Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway: Small improvements to error messages and behavior during updates. #3811

Merged
merged 29 commits into from
Mar 6, 2020

Conversation

abernix
Copy link
Member

@abernix abernix commented Feb 20, 2020

These are just a couple small improvements to the behavior of Apollo Gateway during error conditions which arise while attemtping to update the schema.

Most notably, this removes a previously trapped error which should be propogated up the call stack.

I believe these small bits are the very lowest hanging of fruit and that these updates are orthongonal/complimentary to additional retry functionality and improved usability of the Gateway which is something @trevor-scheer is currently looking at.

@abernix abernix force-pushed the abernix/gateway-minor-qol-improvements branch 4 times, most recently from 2efa066 to b8901dc Compare February 27, 2020 16:06
@trevor-scheer trevor-scheer changed the base branch from release-2.11.0 to release-2.12.0 March 3, 2020 19:26
abernix added 17 commits March 3, 2020 11:27
This message was being issued when a new server starts up, prior to ever having
a schema, when a storage secret (or any other artifact) can't be fetched
from GCS (or any error within `updateServiceDefinitions`) despite the fact
that there may be no `previousSchema`.

While I could use `previousSchema` to change the error message, I don't
think that it's this methods concern to decide what happens in the event of
an error.  I think it's only this methods concern to actually do the update,
if it is in fact successful.
Currently, we log the error and just return without throwing which causes
`load` (one of two places where `updateComposition` is called to not
actually fail and follow-up logic then suggests "Gateway successfully loaded
schema", even though that cannot be true.

The nice thing about this change (in addition to allowing someone to `catch`
an error from `ApolloGateway.prototype.load`) is that this should bubble all
the way up to the place where we currently call `load` within Apollo
Server's constructor, and stop the server from starting in this failure
condition:

https://github.com/apollographql/apollo-server/blob/7dcee80ff33061c0911458d593ebbca5a9c73939/packages/apollo-server-core/src/ApolloServer.ts#L366

The other place we call `updateComposition` is within a `setTimeout`.  That
particular case is less troublesome since it will retry on the next interval.
No need to have them so far away from their usage and to even do these
assignments at all when there are already escape routes that exit this code
path.
* Refer to the thing we are processing consistently as "schema definitions".
* Also make them generally more factual.
Rely on `message` only if it's present, and fall back to serialization
methods which might exist on the prototype otherwise (e.g. `toJSON`,
`toString`).

Also, switch to a single parameter usage of the logging facility.  While
`console.log` supports it, its certainly possible that the logger will
_not_, and will need that positional parameter for something else.
While successful `updateComposition` invocations were working properly,
failed invocations (including GCS access errors, network hiccups and just
general configuration mistakes) were currently cluttering the logs with
warnings of unhandled promise rejections.

While those unhandled rejections did include the actual error messages, this
adjusts them to be caught and logged in a structured manner with our
`logger`, sparing our logs from those unnecessarily verbose (and scary!)
messages.
Adjusting the typings only works for users of TypeScript.  On the other
hand, this is a one-time assertion which happens at instantiation time which
could save a JavaScript developer from the pain of not knowing what's going
on!

There might be typing improvements to be had, but claiming it as an alternate
approach to handling this isn't correct.  The typings can still be improved,
of course.
Overall, the success/failure behavior should be expected to be similar for
all of these requests, as they all access the same GCS store.
Sometimes, the error object which is being caught here is in fact not a
misconfiguration of server options, but rather an error which was thrown in
a promise chain.

By appending the "Invalid options provided to ApolloServer" string to the
error object's `message` property in this method that is called on _every_
request to the server (`runHttpQuery`), we're risking appending the same
prefix to the same error over and over (i.e. "a: a: a: a: a: b").

While this prefix may have been true at one point, it is no longer possible
to enforce in a world where the schema is obtained asynchronously.
…veries.

Previously, if the initial call to `load` failed, but its intention (fetching a
federated schema configuration in managed federation) is eventually
accomplished via a subsequent fetch (via setInterval polling), the
`executor` would not be set.

This resulted in a continued failure even if the `schema` was eventually set
since federated `schema`s require the Gateway's `executor` to do pull off
their much more complex (remote!) execution strategy!

The solution was simple since `executor` was already present on the actual
`ApolloGateway`, but that required exposing that property as a valid type to
access from the interface that `ApolloGateway` implements: `GraphQLService`.

I don't see why a `GraphQLService` wouldn't have an executor, so it seemed
appropriate to add, particularly since our only `GraphQLService` is the
`ApolloGateway` class itself.
In particular, this blocks any rejected promises which may come from GCS
load prior to returning them to the `schemaDerivedData` promise (which is
where the `initSchema` method assigns the result to).

Failure to do this results in the server middleware sending the error
directly to the client.
…ils.

Another approach to this would be to throw an error here, but this is our
only opportunity to allow the server to recover after initial failure.

On the one hand, this approach is more graceful, but on the other hand,
perhaps we actually want initial failure (after timeouts elapse) to not bind
the other middleware.  Either way, the server doesn't just `process.exit`
right now, and with certain non-Node.js environments, it may be worthwhile
to operate in this mode.
…cts.

Because we want the actual executor to work when a schema might eventually
become available, as it may when `onSchemaChange` hooks eventually succeed.
@trevor-scheer trevor-scheer force-pushed the abernix/gateway-minor-qol-improvements branch from be8e5a8 to 4f2fbff Compare March 3, 2020 19:27
@abernix abernix added this to the Release 2.12.0 milestone Mar 6, 2020
@abernix abernix merged commit 2094947 into release-2.12.0 Mar 6, 2020
abernix added a commit that referenced this pull request Mar 6, 2020
Previously, when attempting to compose a schema from a downstream service in
unmanaged mode, the unavailability of a service would not cause composition
to fail.

Given a condition when the remaining downstream services are still
composable (e.g. they do not depend on the unavailable service and it does
not depend on them), this could still render a valid, but unintentionally
partial schema.

While a partial schema is in many ways fine, it will cause any client's
queries against that now-missing part of the graph to suddenly become
queries which will no longer validate, despite the fact that they may have
previously been designed to fail gracefully during degradation of the service.

Rather than simply logging errors with `console.error` in those conditions,
we will now `throw` the errors.  Thanks to changes in the upstream invokers'
error handling (e.g.#3811),
this `throw`-ing will now prevent unintentionally serving an incomplete graph.
abernix added a commit that referenced this pull request Mar 6, 2020
Previously, when attempting to compose a schema from a downstream service in
unmanaged mode, the unavailability of a service would not cause composition
to fail.

Given a condition when the remaining downstream services are still
composable (e.g. they do not depend on the unavailable service and it does
not depend on them), this could still render a valid, but unintentionally
partial schema.

While a partial schema is in many ways fine, it will cause any client's
queries against that now-missing part of the graph to suddenly become
queries which will no longer validate, despite the fact that they may have
previously been designed to fail gracefully during degradation of the service.

Rather than simply logging errors with `console.error` in those conditions,
we will now `throw` the errors.  Thanks to changes in the upstream invokers'
error handling (e.g.#3811),
this `throw`-ing will now prevent unintentionally serving an incomplete graph.
abernix added a commit that referenced this pull request Mar 11, 2020
Previously, when attempting to compose a schema from a downstream service in
unmanaged mode, the unavailability of a service would not cause composition
to fail.

Given a condition when the remaining downstream services are still
composable (e.g. they do not depend on the unavailable service and it does
not depend on them), this could still render a valid, but unintentionally
partial schema.

While a partial schema is in many ways fine, it will cause any client's
queries against that now-missing part of the graph to suddenly become
queries which will no longer validate, despite the fact that they may have
previously been designed to fail gracefully during degradation of the service.

Rather than simply logging errors with `console.error` in those conditions,
we will now `throw` the errors.  Thanks to changes in the upstream invokers'
error handling (e.g.#3811),
this `throw`-ing will now prevent unintentionally serving an incomplete graph.
@abernix abernix deleted the abernix/gateway-minor-qol-improvements branch March 16, 2020 19:18
abernix added a commit to apollographql/federation that referenced this pull request Sep 4, 2020
abernix added a commit to apollographql/federation that referenced this pull request Sep 4, 2020
…l/abernix/gateway-minor-qol-improvements

gateway: Small improvements to error messages and behavior during updates.
Apollo-Orig-Commit-AS: apollographql/apollo-server@2094947
abernix added a commit to apollographql/federation that referenced this pull request Sep 4, 2020
…llographql/apollo-server#3867)

Previously, when attempting to compose a schema from a downstream service in
unmanaged mode, the unavailability of a service would not cause composition
to fail.

Given a condition when the remaining downstream services are still
composable (e.g. they do not depend on the unavailable service and it does
not depend on them), this could still render a valid, but unintentionally
partial schema.

While a partial schema is in many ways fine, it will cause any client's
queries against that now-missing part of the graph to suddenly become
queries which will no longer validate, despite the fact that they may have
previously been designed to fail gracefully during degradation of the service.

Rather than simply logging errors with `console.error` in those conditions,
we will now `throw` the errors.  Thanks to changes in the upstream invokers'
error handling (e.g.apollographql/apollo-server#3811),
this `throw`-ing will now prevent unintentionally serving an incomplete graph.
Apollo-Orig-Commit-AS: apollographql/apollo-server@2562ad3
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants