Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addons: make the API endpoint as cacheable as possible #10536

Closed
humitos opened this issue Jul 12, 2023 · 12 comments · Fixed by #10823
Closed

Addons: make the API endpoint as cacheable as possible #10536

humitos opened this issue Jul 12, 2023 · 12 comments · Fixed by #10823
Assignees
Labels
Needed: design decision A core team decision is required

Comments

@humitos
Copy link
Member

humitos commented Jul 12, 2023

Once we are fully migrated to the new Addons, we will be hitting the /_/addons/ endpoint a lot. This endpoint will be hit by every single request on a documentation page view. We would like to make this endpoint as fast as possible, but also try to cache it to avoid hitting our servers. It can be compared with the current Footer API.

The /_/addons/ endpoint needs to be cached by:

Note the endpoint is behind authentication, so that it's not an issue here.

To build the response, the endpoint depends on:

  • Project instance
  • Version instance
  • Build instance

We need to use the "backend resolver" to build the base_url required by the DocDiff addons. That URL depends on the exact filename of the page, so we need cache by url parameter. We could talk about different ideas here to be able to not depend on it if we want to. Some ideas the mentioned:

  • create a new endpoint to use the backend resolver, and call it for docdiff
  • copy / re-implement the "backend resolver" logic in the frontend
  • make the current endpoint "expandable" and only return docdiff data when users enable the docdiff in their docs
  • add a new query string parameter docdiff=[true|false] that's only true on pageviews from *.readthedocs.build

The last item seems to be the easiest one and will give us most of the benefits, since we 95% of our traffic will call the endpoint with docdiff=false 😄

@humitos humitos added the Needed: design decision A core team decision is required label Jul 12, 2023
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Jul 12, 2023
@humitos
Copy link
Member Author

humitos commented Jul 12, 2023

X-RTD-Hosting-Integrations-Version HTTP header
X-RTD-Hosting-Integrations-API-Version HTTP header

We also talked about moving these to query string paramenters because it's easier to cache them than HTTP headers: https://developers.cloudflare.com/cache/how-to/cache-keys/#create-custom-cache-keys

@humitos
Copy link
Member Author

humitos commented Jul 17, 2023

I took a look at the Algolia integration we have setup on https://rtd-with-docsearch.readthedocs.io/en/latest/ and it seems they send all the versions in the query string. Something to consider 😉

Screenshot_2023-07-17_16-59-01

@humitos humitos self-assigned this Sep 4, 2023
humitos added a commit that referenced this issue Sep 19, 2023
Move `X-RTD-Hosting-Integration-Version` header to `api-version` querystring
parameter. This has some benefits:

- make links to this URL clickable (there is no need to use a browser extension
  to send a specific header in the request)
- easier to cache using the default strategy supported by CF

Related #10536
humitos added a commit that referenced this issue Sep 19, 2023
Move `X-RTD-Hosting-Integration-Version` header to `api-version` querystring
parameter. This has some benefits:

- make links to this URL clickable (there is no need to use a browser extension
  to send a specific header in the request)
- easier to cache using the default strategy supported by CF

Related #10536
@humitos humitos moved this from Planned to In progress in 📍Roadmap Sep 19, 2023
@humitos
Copy link
Member Author

humitos commented Sep 19, 2023

We also talked about moving these to query string paramenters because it's easier to cache them than HTTP headers

I implemented this part in #10753 and readthedocs/addons#138

@humitos
Copy link
Member Author

humitos commented Sep 19, 2023

Currently, we are using unresolver.unresolve_url(url) to get the Project and Version objects. I don't think we can easily get rid of the url= parameter since we need to, somehow, communicate to the endpoint the project and version the user is reading.

However, it would be interesting to figure it out how to do this because in that case, it will allow us to cache the request only "per project & version & build 1" instead of "per project & version & build & exact URL" as we are currently doing now. There are some ideas here:

  1. use the Referer HTTP header injected automatically by the browser
  2. inject some meta HTML tags in the CF worker: <meta name="project_slug" content="test-builds"> and <meta name="version_slug" content="latest">. Then our JS client could send them in the request /_/addons/?project_slug=test-builds&version_slug="latest"

I think 1) is probably the easiest and simplest option since 2) seems pretty hacky 😅 . Following that pattern, I think this could be as follows:

  • requests made from regular documentation pages:
    • get the Referer HTTP header and call unresolver.unresolve_url(referer)
    • cache these requests normally (using the project:version CDN tags we are already using)
    • do not return any data that depends on the exact URL
  • requests made on documentation pages that depend on their exact URL (e.g. pull request previews)
    • send the url= attribute with the window.location.href value
    • use the url= attribute to call the unresolver
    • populate response data that depends on the exact URL (e.g. doc_diff.base_url)
    • cache these requests normally. They won't interfere with the previous cached requests because they have an extra url= attribute. They also won't interfere between themselves because they all have different url= values and CF will treat them differently 💪🏼

This sounds pretty simple to understand and follow in my head, but I've been thinking on this pattern for some time already. I'd like to get some feedback on this and decide how we want to move forward.

Note that this will have an impact in our web instances and also in our database. Besides, it will make our addons rendering a lot faster since most of the requests to the endpoint will be cached and returned directly from CF. Currently, the numbers for this endpoint in the last 7 days the response times are:

  • AVG: 129 ms
  • Median: 76.5 ms
  • 95th percentile: 160 ms

Footnotes

  1. the build is automatically fetched (no need to pass it to the endpoint) by querying the latest successful build for that particular version.

humitos added a commit that referenced this issue Sep 20, 2023
Move `X-RTD-Hosting-Integration-Version` header to `api-version` querystring
parameter. This has some benefits:

- make links to this URL clickable (there is no need to use a browser extension
  to send a specific header in the request)
- easier to cache using the default strategy supported by CF

Related #10536
@humitos
Copy link
Member Author

humitos commented Sep 22, 2023

get the Referer HTTP header and call unresolver.unresolve_url(referer)

Hrm, thinking a little more about this, isn't this the same as passing the url= attribute? I mean, instead of vary the cache on the url= we will need to vary it on Referer header, which will have exactly the same behavior, right?

Maybe option 2), even being hacky 🕵️, is the only option that works how we want 😄

@humitos
Copy link
Member Author

humitos commented Sep 22, 2023

Oh, wait! We are already returning X-RTD-project and X-RTD-Version headers to the browser 🚀 . This solves this problem in a much better way 💯

$ curl -sIL https://docs.readthedocs.io/en/stable/ | grep x-rtd-
x-rtd-domain: docs.readthedocs.io
x-rtd-path: /proxito/html/docs/stable/index.html
x-rtd-project: docs
x-rtd-project-method: public_domain
x-rtd-version: stable
x-rtd-version-method: path

With that data we can call our addons API as follows: /_/addons/?project__slug=<X-RTD-Project>&version__slug=<X-RTD-Version>

One thing that I'm not sure how to do is how to get those headers from a request that was done before our readthedocs-addons.js file was loaded. If there is no way to do that, we would need to do an extra HEAD request to get them:

fetch(window.location.href, {method: "HEAD"})
  .then((response) => { 
    const projectSlug = response.headers.get("x-rtd-project"))
    const versionSlug = response.headers.get("x-rtd-version"))
    fetch("/_/addons/?" + new URLSearchParams({ "project__slug": projectSlug, "version__slug": versionSlug})
      .then((response) => {
         // ...
      });
  }
); 

@humitos
Copy link
Member Author

humitos commented Sep 22, 2023

Even doing an extra HEAD request it will be much better. We can re-search if that possible and manage it as an improvement.

With this approach, we will have two different requests:

  1. Including url= parameter: will be made if there is any addons enable on the page that depends on the URL itself. DocDiff addon is a good example of this. Then, on pages that have DocDiff enabled (pull request previews), our JS will hit /_/addons/?project__slug=...&version_slug=... + &url=...
  2. Omitting the url= parameter: if there aren't any addons enabled on the current page that requires sending the exact URL to the API, the request won't include it: /_/addons/?project__slug=...&version__slug=...

With this, our CDN will handle the cache correctly since it will treat the URLs containing only the project__slug and version__slug as the same, no matter what page under the documentation the user is reading. Also, it will treat all the URLs containing url= different for all the different pages of the documentation.

@humitos
Copy link
Member Author

humitos commented Sep 22, 2023

Now, with this, we will need to know before performing the request what addons "could be enabled" on the current page to decide whether or not include the url=. We will need to ask each addon if they could be enabled, and in that case, if they require sending the url= parameter to the backend. Then, if none of the addons that could be enabled on the page needs to send the url= parameter, we don't send it 👍🏼

I imagine something like:

let addons = [ ... ];
let params = {"client-version": "...", "api-version": "..."};
let sendUrlParam = false;
for (const addon of addons) {
  if (addon.requiresURL() && addon.couldBeEnabledOnPage()) {
    sendUrlParam = true;
    break
  }
}

if (sendUrlParam) {
  params.url = window.location.href;
}

...

fetch(url, ....);

Then, an addons like DocDiff would implement:

export class DocDiffAddon extends AddonBase {

  static requiresUrl() {
    return true;
  }

  static couldBeEnabledOnPage() {
    return window.location.host.endsWith(".readthedocs.build");
  }

}

Links to addons code:

@humitos humitos moved this from In progress to Needs review in 📍Roadmap Sep 22, 2023
@ericholscher
Copy link
Member

@humitos I don't love the idea of doing an extra HEAD request, because that will increase latency on the rendering of the addons (2 complete HTTP round trips after page load). If we're already injecting data into the page to add the addons JS, then injecting the value of the project & version slug doesn't seem like additional hackiness, and if it saves us that round trip, likely worth doing.

I don't believe there's a way to capture the headers from the request in our JS, which would be a really nice solution to the problem.

  1. Including url= parameter: will be made if there is any addons enable on the page that depends on the URL itself. DocDiff addon is a good example of this. Then, on pages that have DocDiff enabled (pull request previews), our JS will hit /_/addons/?project__slug=...&version_slug=... + &url=...
  2. Omitting the url= parameter: if there aren't any addons enabled on the current page that requires sending the exact URL to the API, the request won't include it: /_/addons/?project__slug=...&version__slug=...

Is this the plan we're going with, assuming that we get the project & version slug somehow..., either via a HEAD request, or injected into the page? This logic looks 💯 to me. I think the above logic is great, and then we can try injecting the project/version values in the CF worker, and if that doesn't work, always fall back to a HEAD request.

(Or some other idea we haven't come up with yet 🤔... I think perhaps we could also just parse the URL in JS and only send the hostname and matching version path to the API, but that's harder with the custom URLconf work we've been doing, which would be additional data coming from the server that we need to have in the client, which is the primary problem we're trying to solve here..)

@humitos
Copy link
Member Author

humitos commented Oct 11, 2023

Thanks @ericholscher for your input here.

Summarizing, this is the current plan that I will implement then:

  • inject project/version slug on the HTML using <meta> tags via CF worker with the same script we are using to inject the addons js
  • grab project/version slug via our addons js and send them to the API
  • implement the Addon.couldBeEnabledOnPage js function on each addon to decide whether or not send url= attribute
  • send url= attribute if there is at least one addon that has to be enabled on the page

Let me know if you agree with the current plan.

I think perhaps we could also just parse the URL in JS and only send the hostname and matching version path to the API, but that's harder with the custom URLconf work we've been doing

I thought about this already and I didn't see it clearly since we would have to re-implement the Python resolver logic into JavaScript and always keep it updated with all the changes/feature we introduce on it. I thought that would be fragile and we will end up with bugs on both making them behave differently and hard to debug in the end 🤷🏼

@ericholscher
Copy link
Member

@humitos that plan sounds 💯.

@humitos
Copy link
Member Author

humitos commented Oct 16, 2023

inject project/version slug on the HTML using <meta> tags via CF worker with the same script we are using to inject the addons js

This is done and deployed already. See it in action at https://test-builds.readthedocs.io/en/latest/

humitos added a commit to readthedocs/addons that referenced this issue Oct 16, 2023
Follow the plan discussed in
readthedocs/readthedocs.org#10536

Grab the `project-slug` and `version-slug` from the `meta` tag injected by CF
and send it back to the backend. If there is any addons that can be enabled in
the current page that depends on the URL, it will send it also.

Each Addon class should implement `requireUrlParam` to decide whether or not to
send the `url=` to the backend API or not.
humitos added a commit that referenced this issue Oct 16, 2023
`/_/addons/` API endpoint now accepts `project-slug` and `version-slug`.
This is useful to be able to cache these requests in our CDN.

It still supports sending `url=` when necessary for those requests that relies
on the full URL to generate specific data in the response.

The NGINX config is updated to inject `meta` tags with project/version slugs,
emulating what our CF worker does on production.

Closes #10536
humitos added a commit to readthedocs/addons that referenced this issue Oct 24, 2023
Follow the plan discussed in
readthedocs/readthedocs.org#10536

Grab the `project-slug` and `version-slug` from the `meta` tag injected
by CF and send it back to the backend. If there is any addons that can
be enabled in the current page that depends on the URL, it will send it
also.

Each Addon class should implement `requireUrlParam` to decide whether or
not to send the `url=` to the backend API or not.

API changes in:
readthedocs/readthedocs.org#10823
@github-project-automation github-project-automation bot moved this from Needs review to Done in 📍Roadmap Oct 24, 2023
humitos added a commit that referenced this issue Oct 24, 2023
* Addons: accept `project-slug` and `version-slug` on endpoint

`/_/addons/` API endpoint now accepts `project-slug` and `version-slug`.
This is useful to be able to cache these requests in our CDN.

It still supports sending `url=` when necessary for those requests that relies
on the full URL to generate specific data in the response.

The NGINX config is updated to inject `meta` tags with project/version slugs,
emulating what our CF worker does on production.

Closes #10536

* Add tests and update logic for API permissions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants