Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move states of search to coordinating node #52741

Merged
merged 4 commits into from
Mar 3, 2020

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 25, 2020

This commit moves the states of search to the coordinating node instead of keeping them in the data node. Changes in this commit:

  1. Include ShardSearchRequest to QuerySearchResult
  2. Resend ShardSearchRequest in step 1 in the fetch phase so we can recreate a SearchContext from ShardFetchRequest
  3. Include AggregatedDfs to ShardFetchRequest as we need it in both the query and fetch phase
  4. Include RecoreDocIds to QuerySearchResult and send it back in the fetch phase

Relates #46523

I ran a few benchmarks for this change on a single node with http_logs, geonames, and geopoint. There was no regression. I will try to benchmark it with multiple nodes.

@dnhatn dnhatn added the :Search/Search Search-related issues that do not fall into other categories label Feb 25, 2020
@dnhatn dnhatn requested a review from jimczi February 25, 2020 03:49
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@dnhatn
Copy link
Member Author

dnhatn commented Feb 26, 2020

testHistoryIsWrittenWithFailure

@elasticmachine test this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the approach here. The backward compatibility layer is simple and efficient since the main logic is unchanged (we rebuild the search context in all cases). I also find it nice that we send the rewritten shard request back and forth between phases. This effectively moves the state of the search to the coordinating node.
I left one comment that could be addressed in a follow up. The rest looks good to me.

plugins.add(PercolatorPlugin.class);
return plugins;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed for this pr ? It seems unrelated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that MockNioTransport does not register NamedWriteables from the bundled plugins. We need to manually register the percolate plugin in the test as we now de/serialize the Percolate QueryBuilder between search phases. I looked into how to fix this issue, but it should be in a follow-up.

public void setRescoreDocIds(RescoreDocIds rescoreDocIds) {
this.rescoreDocIds = rescoreDocIds;
}

@Override
public void writeTo(StreamOutput out) throws IOException {
// TODO: this seems wrong, SearchPhaseResult should have a writeTo?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could apply the TODO here and clean the (de)serialization ? This would mean reconciling the logic between the different implementations and move the serialization of the object accessible here to this function. Happy to do this in a follow up though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. I will do this in a follow-up.

@dnhatn
Copy link
Member Author

dnhatn commented Mar 3, 2020

@jimczi Thank you for your review.

@dnhatn dnhatn merged commit 206381e into elastic:reader-context Mar 3, 2020
@dnhatn dnhatn deleted the stateless-search-context branch March 3, 2020 19:33
dnhatn added a commit that referenced this pull request Aug 25, 2020
This commit introduces a new API that manages point-in-times in x-pack 
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The 
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be 
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is 
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <[email protected]>
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Sep 10, 2020
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: elastic#52741
- Allow searches with a specific reader context: elastic#53989
- Add the ability to acquire readers in IndexShard: elastic#54966

Relates elastic#46523
Relates elastic#26472

Co-authored-by: Jim Ferenczi <[email protected]>
dnhatn added a commit that referenced this pull request Sep 10, 2020
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <[email protected]>
dnhatn added a commit that referenced this pull request Apr 3, 2021
Previously, the search states are stored in ReaderContext on data nodes.
Since 7.10, we send them to the coordinating node in a QuerySearchResult
of a ShardSearchRequest and the coordinating node then sends them back
in ShardFetchSearchRequest. We must keep the search states in data nodes
unless they are sent back in the fetch phase. We used the channel version
to determine this guarantee. However, it's not correct in CCS requests in 
mixed clusters.

1. The coordinating node of the local cluster on the old version sends a
ShardSearchRequest to a proxy node of the remote cluster on the new
version. That proxy node delivers the request to the data node. In this
case, the channel version between the data node and the proxy node is 
>= 7.10, but we won't receive the search states in the fetch phase as they
are stripped out in the channel between the old coordinating node and
the new proxy.

```
[coordinating node v7.9] --> [proxy node v7.10] --> [data node on v7.10]
```

2. The coordinating node of the local on the new version sends a
ShardSearchRequest to a proxy node of the remote cluster on the new
version. However, the coordinating node sends a ShardFetchSearchRequest
to another proxy node of the remote cluster that is still on an old
version. The search states then are stripped out and never reach the
data node.

```
-> query phase: [coordinating node v7.10] --> [proxy node v7.10] --> [data node on v7.10]
-> fetch phase: [coordinating node v7.10] --> [proxy node v7.9] --> [data node on v7.10]
```

This commit fixes the first issue by explicitly serializing the channel version in 
a ShardSearchRequest and the second by continue storing the search states 
in ReaderContext unless all nodes are upgraded.

Relates #52741
dnhatn added a commit that referenced this pull request Apr 3, 2021
Previously, the search states are stored in ReaderContext on data nodes.
Since 7.10, we send them to the coordinating node in a QuerySearchResult
of a ShardSearchRequest and the coordinating node then sends them back
in ShardFetchSearchRequest. We must keep the search states in data nodes
unless they are sent back in the fetch phase. We used the channel version
to determine this guarantee. However, it's not correct in CCS requests in 
mixed clusters.

1. The coordinating node of the local cluster on the old version sends a
ShardSearchRequest to a proxy node of the remote cluster on the new
version. That proxy node delivers the request to the data node. In this
case, the channel version between the data node and the proxy node is 
>= 7.10, but we won't receive the search states in the fetch phase as they
are stripped out in the channel between the old coordinating node and
the new proxy.

```
[coordinating node v7.9] --> [proxy node v7.10] --> [data node on v7.10]
```

2. The coordinating node of the local on the new version sends a
ShardSearchRequest to a proxy node of the remote cluster on the new
version. However, the coordinating node sends a ShardFetchSearchRequest
to another proxy node of the remote cluster that is still on an old
version. The search states then are stripped out and never reach the
data node.

```
-> query phase: [coordinating node v7.10] --> [proxy node v7.10] --> [data node on v7.10]
-> fetch phase: [coordinating node v7.10] --> [proxy node v7.9] --> [data node on v7.10]
```

This commit fixes the first issue by explicitly serializing the channel version in 
a ShardSearchRequest and the second by continue storing the search states 
in ReaderContext unless all nodes are upgraded.

Relates #52741
dnhatn added a commit that referenced this pull request Apr 4, 2021
Previously, the search states are stored in ReaderContext on data nodes.
Since 7.10, we send them to the coordinating node in a QuerySearchResult
of a ShardSearchRequest and the coordinating node then sends them back
in ShardFetchSearchRequest. We must keep the search states in data nodes
unless they are sent back in the fetch phase. We used the channel version
to determine this guarantee. However, it's not correct in CCS requests in 
mixed clusters.

1. The coordinating node of the local cluster on the old version sends a
ShardSearchRequest to a proxy node of the remote cluster on the new
version. That proxy node delivers the request to the data node. In this
case, the channel version between the data node and the proxy node is 
>= 7.10, but we won't receive the search states in the fetch phase as they
are stripped out in the channel between the old coordinating node and
the new proxy.

```
[coordinating node v7.9] --> [proxy node v7.10] --> [data node on v7.10]
```

2. The coordinating node of the local on the new version sends a
ShardSearchRequest to a proxy node of the remote cluster on the new
version. However, the coordinating node sends a ShardFetchSearchRequest
to another proxy node of the remote cluster that is still on an old
version. The search states then are stripped out and never reach the
data node.

```
-> query phase: [coordinating node v7.10] --> [proxy node v7.10] --> [data node on v7.10]
-> fetch phase: [coordinating node v7.10] --> [proxy node v7.9] --> [data node on v7.10]
```

This commit fixes the first issue by explicitly serializing the channel version in 
a ShardSearchRequest and the second by continue storing the search states 
in ReaderContext unless all nodes are upgraded.

Relates #52741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants