Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include next_page and prev_page metadata on search #1763

Merged
merged 2 commits into from
Jul 16, 2019

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Jun 10, 2019

As discussed in the May 30th team meeting
(https://discordapp.com/channels/442252698964721669/448525639469891595/583749739166695444),
this is the first step in deprecating/removing numbered pagination for
the "all crates" endpoint. The remaining steps are:

  • Open issues/PRs on bots which have set a user agent that lets us
    identify them, have them start following this link
  • Change our frontend to show "next/prev page" links on the all crates
    page
  • Stop returning the "total" meta item when the next/prev page links
    will be cursor based
  • Switch the links over to cursor based pagination when applicable
  • Error when given pagination parameters that would cause very high
    offsets

Since the problem we're addressing only occurs when the offset is
greater than ~18k, the only place that we currently need to worry about
this is "all crates". There isn't currently any filter you can apply
that will return enough rows for this to be a problem (?letter= is
the most likely, but honestly that is probably the least useful
page/endpoint we have and it might just be worth removing it. Either way
AFAIK it's not hit by crawlers and humans aren't visiting page 900 in
their browsers, so even if we do end up with enough crates for the
offset to possibly be too high, it's unlikely anyone will ever hit it
there).

I'm not entirely sure how I want to structure the final logic for "give
me the next/previous page". I've just stuck it at the bottom of the
search function for now. We'll want something more structured in the
long term, but I'd like to see what this looks like with the cursor
based option before pulling it out into a more general abstraction.

@bors
Copy link
Contributor

bors commented Jun 11, 2019

☔ The latest upstream changes (presumably #1756) made this pull request unmergeable. Please resolve the merge conflicts.

sgrif added 2 commits June 11, 2019 17:31
As discussed in the May 30th team meeting
(https://discordapp.com/channels/442252698964721669/448525639469891595/583749739166695444),
this is the first step in deprecating/removing numbered pagination for
the "all crates" endpoint. The remaining steps are:

- Open issues/PRs on bots which have set a user agent that lets us
  identify them, have them start following this link
- Change our frontend to show "next/prev page" links on the all crates
  page
- Stop returning the "total" meta item when the next/prev page links
  will be cursor based
- Switch the links over to cursor based pagination when applicable
- Error when given pagination parameters that would cause very high
  offsets

Since the problem we're addressing only occurs when the offset is
greater than ~18k, the only place that we currently need to worry about
this is "all crates". There isn't currently any filter you can apply
that will return enough rows for this to be a problem (`?letter=` is
the most likely, but honestly that is probably the least useful
page/endpoint we have and it might just be worth removing it. Either way
AFAIK it's not hit by crawlers and humans aren't visiting page 900 in
their browsers, so even if we do end up with enough crates for the
offset to possibly be too high, it's unlikely anyone will ever hit it
there).

I'm not entirely sure how I want to structure the final logic for "give
me the next/previous page". I've just stuck it at the bottom of the
search function for now. We'll want something more structured in the
long term, but I'd like to see what this looks like with the cursor
based option before pulling it out into a more general abstraction.
@sgrif sgrif force-pushed the sg-include-page-links branch from 73fdf87 to cf5c284 Compare June 11, 2019 23:32
@@ -31,7 +32,7 @@ mod prelude {
crate::util::json_response(t)
}

fn query(&self) -> HashMap<String, String> {
fn query(&self) -> IndexMap<String, String> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexMap defaults to the RandomState hasher from std, so this looks fine from a DOS perspective.

@jtgeibel
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jul 16, 2019

📌 Commit cf5c284 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Jul 16, 2019

⌛ Testing commit cf5c284 with merge e1baf64...

bors added a commit that referenced this pull request Jul 16, 2019
Include `next_page` and `prev_page` metadata on search

As discussed in the May 30th team meeting
(https://discordapp.com/channels/442252698964721669/448525639469891595/583749739166695444),
this is the first step in deprecating/removing numbered pagination for
the "all crates" endpoint. The remaining steps are:

- Open issues/PRs on bots which have set a user agent that lets us
  identify them, have them start following this link
- Change our frontend to show "next/prev page" links on the all crates
  page
- Stop returning the "total" meta item when the next/prev page links
  will be cursor based
- Switch the links over to cursor based pagination when applicable
- Error when given pagination parameters that would cause very high
  offsets

Since the problem we're addressing only occurs when the offset is
greater than ~18k, the only place that we currently need to worry about
this is "all crates". There isn't currently any filter you can apply
that will return enough rows for this to be a problem (`?letter=` is
the most likely, but honestly that is probably the least useful
page/endpoint we have and it might just be worth removing it. Either way
AFAIK it's not hit by crawlers and humans aren't visiting page 900 in
their browsers, so even if we do end up with enough crates for the
offset to possibly be too high, it's unlikely anyone will ever hit it
there).

I'm not entirely sure how I want to structure the final logic for "give
me the next/previous page". I've just stuck it at the bottom of the
search function for now. We'll want something more structured in the
long term, but I'd like to see what this looks like with the cursor
based option before pulling it out into a more general abstraction.
@bors
Copy link
Contributor

bors commented Jul 16, 2019

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing e1baf64 to master...

@bors bors merged commit cf5c284 into rust-lang:master Jul 16, 2019
@sgrif sgrif deleted the sg-include-page-links branch August 14, 2019 17:28
sgrif added a commit to sgrif/crates.io that referenced this pull request Aug 18, 2019
This continues the changes made in rust-lang#1763. Since that PR was merged, one
of the non-code steps has been taken care of -- All users hitting any
endpoint with `?page=20` (which is an arbitrary search pattern that
seemed high enough to give any crawlers going through pagination) have
been contacted about the change, with a PR opened against any that
included a repo. (Intersting aside, there are *zero* records of this for
any endpoint other than search, which perhaps implies we can get rid of
a few of these endpoints, but that's a separate discussion).

This PR does not change any functionality, but moves some code around
to better encapsulate things for upcoming changes. Specifically:

- Change our frontend to show "next/prev page" links on the all crates
  page
- Stop returning the "total" meta item when the next/prev page links
  will be cursor based (which I'd actually just like to start omitting
  in general)

The main goal of this change was to stop having any code outside of
`Paginated` (which has been renamed to `PaginatedQuery`, as there's a
new type called `Paginated`) care about how pagination occurs. This
means other code can't care about *how* pagination happens (with the
exception of `reverse_dependencies`, which uses raw SQL, and sorta has
to... That was a bit of a rabbit hole, see
diesel-rs/diesel#2150 for details. Given the
low traffic to that endpoint, I think we can safely ignore it).

The distribution of responsibilities is as follows:

- `PaginatedQuery<T>`: Given the query params, decides how to paginate
  things, generates appropriate SQL, loads a `Paginated<T>`.
- `Paginated<T>`: Handles providing an iterator to the records, getting
  the total count (to be removed in the near future), providing the
  next/prev page params
- `Request`: Takes the pagination related query params, turns that into
  an actual URL (note: Due to jankiness in our router, this will only
  ever be a query string, we have no way of getting the actual path)

The next step from here is to change our UI to stop showing page
numbers, and then remove the `total` field.

This PR will introduce a performance regression that was addressed by
 rust-lang#1668. That PR was addressing "this will become a problem in a future",
not "this is about to take the site down". Given the intent to remove
the `total` field entirely, I think it is fine to cause this regression
in the short term. If we aren't confident that the changes to remove
this field will land in the near future, or want to be conservative
about this, I can add some additional complexity/git churn to retain the
previous performance characteristics
bors added a commit that referenced this pull request Sep 5, 2019
Refactor pagination to prepare for cursor-based pagination.

This continues the changes made in #1763. Since that PR was merged, one
of the non-code steps has been taken care of -- All users hitting any
endpoint with `?page=20` (which is an arbitrary search pattern that
seemed high enough to give any crawlers going through pagination) have
been contacted about the change, with a PR opened against any that
included a repo. (Intersting aside, there are *zero* records of this for
any endpoint other than search, which perhaps implies we can get rid of
a few of these endpoints, but that's a separate discussion).

This PR does not change any functionality, but moves some code around
to better encapsulate things for upcoming changes. Specifically:

- Change our frontend to show "next/prev page" links on the all crates
  page
- Stop returning the "total" meta item when the next/prev page links
  will be cursor based (which I'd actually just like to start omitting
  in general)

The main goal of this change was to stop having any code outside of
`Paginated` (which has been renamed to `PaginatedQuery`, as there's a
new type called `Paginated`) care about how pagination occurs. This
means other code can't care about *how* pagination happens (with the
exception of `reverse_dependencies`, which uses raw SQL, and sorta has
to... That was a bit of a rabbit hole, see
diesel-rs/diesel#2150 for details. Given the
low traffic to that endpoint, I think we can safely ignore it).

The distribution of responsibilities is as follows:

- `PaginatedQuery<T>`: Given the query params, decides how to paginate
  things, generates appropriate SQL, loads a `Paginated<T>`.
- `Paginated<T>`: Handles providing an iterator to the records, getting
  the total count (to be removed in the near future), providing the
  next/prev page params
- `Request`: Takes the pagination related query params, turns that into
  an actual URL (note: Due to jankiness in our router, this will only
  ever be a query string, we have no way of getting the actual path)

The next step from here is to change our UI to stop showing page
numbers, and then remove the `total` field.

This PR will introduce a performance regression that was addressed by
 #1668. That PR was addressing "this will become a problem in a future",
not "this is about to take the site down". Given the intent to remove
the `total` field entirely, I think it is fine to cause this regression
in the short term. If we aren't confident that the changes to remove
this field will land in the near future, or want to be conservative
about this, I can add some additional complexity/git churn to retain the
previous performance characteristics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants