Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi get api to the high level rest client #27337

Merged
merged 1 commit into from
Jan 16, 2018

Conversation

martijnvg
Copy link
Member

Relates to #27205

@vshank77
Copy link

Can we contribute to this one?

@javanna
Copy link
Member

javanna commented Nov 14, 2017

hi @vshank77 this is almost done already, it only needs review, which I am going to do in the coming days. Maybe you can pick another API from the list in #27205 ?

@martijnvg martijnvg added v6.2.0 and removed v6.1.0 labels Nov 22, 2017
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @martijnvg sorry it took me ages to review this. I left some comments and also asked @cbuescher to have a look too.

@@ -283,6 +285,25 @@ public void getAsync(GetRequest getRequest, ActionListener<GetResponse> listener
performRequestAsyncAndParseEntity(getRequest, Request::get, GetResponse::fromXContent, listener, singleton(404), headers);
}

/**
* Retrieves multi documents by id using the Multi Get API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/multi/multiple ?

}

/**
* Asynchronously retrieves multi documents by id using the Multi Get API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/multi/multiple ?

}
}

int numberOfRequests = randomIntBetween(0, 32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when we add no items?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test it is fine, but on the server side we do fail if no items have been specified (see MultiGetRequest#validate(...)).

} else {
fetchSourceContext = new FetchSourceContext(false);
}
item.fetchSourceContext(fetchSourceContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use randomizeFetchSourceContextParams ?

break;
case START_ARRAY:
if (Fields.DOCS.equals(currentFieldName) == false) {
throw new ElasticsearchParseException("Unexpected field [" + currentFieldName + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that throwing exceptions here is a good idea. We should rather be lenient to ensure forward compatibility. Say we add another array at the same level of docs (not likely, but still...) we should not throw error if your own client receives such a response. Rather ignore it.

items.add(new MultiGetItemResponse(getResponse, null));
}
} else {
throw new ElasticsearchParseException("Unexpected token [" + innerToken + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, throwing is a problem.

// then creating a parser from that is a bit inefficient, but makes the parsing code less
// complex. I think this is the right trade off. Unless we introduce a new xcontent format
// in mget response for the hl rest client that is enabled via a parameter that renders
// an unambiguous format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me a bit sad to have to go with maps here, yet I see why it is hard to do it differently.
@cbuescher any idea on making this work without maps? Here is an example response for clarify:

{
  "docs" : [
    {
      "_index" : "my_index",
      "_type" : "my_type",
      "_id" : "1",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "user" : [
          {
            "field1" : 1,
            "field2" : 2
          },
          {
            "field1" : 3,
            "field2" : 4
          }
        ]
      }
    },
    {
      "_index" : "test",
      "_type" : "type",
      "_id" : "2",
      "error" : {
        "root_cause" : [
          {
            "type" : "index_not_found_exception",
            "reason" : "no such index",
            "resource.type" : "index_expression",
            "resource.id" : "test",
            "index_uuid" : "_na_",
            "index" : "test"
          }
        ],
        "type" : "index_not_found_exception",
        "reason" : "no such index",
        "resource.type" : "index_expression",
        "resource.id" : "test",
        "index_uuid" : "_na_",
        "index" : "test"
      }
    }
  ]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand after a first glace the difference here is that the objects in docs can eithe be failures or sth. that can already be parsed by GetResult#fromXContent(). The only thing the failure items have is an error field. Ideally we could reuse the GetResult parser so it optionally accepts an error field. Unfortunaltely the current GetResult parser isn't easily extendable, but could we add a flag to it that allows optionally parsing the error field and have GetResult contain an optional Failure?
Another option would be to rewrite GetResult#fromXContent to use Object parser, then we could have a common "declareFields(Parser ...)" method that sets up the parser with the parts that GetResult needs and reuse that setup method plus a parser for the "error" field for a specialized MultiGetResult or something.
I think that way we could avoid parsing to a generic map here, which I would also appreciate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna @cbuescher I'll look into changing the GetResult#fromXContent(...) method, so that we don't have to parse into a generic map here.

case EXTERNAL_GTE:
return "external_gte";
case FORCE:
return "force";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just do versionType.toLowercase(Locale.ROOT) ?

MultiGetResponse expected = createTestInstance();
XContentType xContentType = randomFrom(XContentType.values());
BytesReference shuffled = toShuffledXContent(expected, xContentType, ToXContent.EMPTY_PARAMS, false);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that inserting random fields here would reveal problems on the parsing side with the current code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried inserting random fields, but then the tests fails, because in GetResult#fromXContentEmbedded(...) in line 294 adds any unknow json field as field to retrieve, this causes the expected response item to be not equal with the actial response item.

} else {
throw new IllegalArgumentException("unsupported token [" + token + "]");
}
return xContentBuilder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, I wonder why we are finding this only now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because otherwise FetchSourceContextTests test fails. I guess we never shuffled xcontent that had only a top level value (which is valid).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder why this would be a problem. At least as far as I understand this method should only take whole xContent Objects or Arrays, and they should all parse to a map fine (even with just one field). Or are you talking about things like { "foo" }. Is that valid at all? I would like to take a look at the failing test before adding this here, I think this method should rather check that the start token is either Array- or Object-Start and otherwise reject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you talking about things like { "foo" }. Is that valid at all?

Yes, that is valid.

I noticed that we didn't have a isolated unit test for FetchSourceContext. This class serializes a bool value directly without adding an enclosing json object. That is because fetch source context is part of a search request under a json field. This causes FetchSourceContextTests to fail, because shuffleXContent(...) method returned an empty XContentBuilder. This made me change the shuffleXContent(...) method.

I think I can add an assert here to check whether a value is top level and otherwise fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at FetchSourceContext and I think its toXContent() method is kind of problematic in a sense that it sometimes renders a complete object (with start/end Object) and sometimes only a value. Given that FetchSourceContext extends ToXContentObject, I think the later is a case that shouldn't happen. Should we move this out of this PR or is FetchSourceContextTests needed? I don't think I would like to special-case the shuffle methods for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this test is not needed for adding mget api to high level rest client. So 👍 to move this test out and its related changes out of this pr.

@cbuescher cbuescher self-requested a review December 1, 2017 20:09
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg @javanna I took a first look regarding the parsing to map and left a first idea, I will also take a look at the rest of the PR later today.

@martijnvg martijnvg force-pushed the hl_client_mget_api branch 2 times, most recently from 17484eb to a21b3ec Compare December 5, 2017 09:27
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 7, 2017
@martijnvg
Copy link
Member Author

@javanna @cbuescher I've changed the MultiGetResponse#fromXContent(...) method to be completely streamable and not use a map to temporarily store all the key/value pairs for each response item.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg thanks, I did another round of reviews, I still need to take a short deeper look at the tests and will add those comments later. But the rest looks good already to me minus a few small things.

@@ -123,6 +129,11 @@ public String id() {
return this.id;
}

public Item id(String id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this setter needed? I experimentally removed it and everything seems fine, in which case I would like to remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can be removed. This was a left over from the parser.

@@ -371,9 +403,9 @@ public MultiGetRequest add(@Nullable String defaultIndex, @Nullable String defau

public static void parseDocuments(XContentParser parser, List<Item> items, @Nullable String defaultIndex, @Nullable String defaultType, @Nullable String[] defaultFields, @Nullable FetchSourceContext defaultFetchSource, @Nullable String defaultRouting, boolean allowExplicitIndex) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be private? Seems only be used from "add" and the other parseDocuments() method.

@@ -493,8 +525,8 @@ public static void parseDocuments(XContentParser parser, List<Item> items) throw
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we need public static void parseDocuments(XContentParser parser, List<Item> items) in the code? I think we can remove this and only leave "add(...)" as an entry point to parsing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add(...) method is pretty large already, I like not to make it larger than it already is by adding the that is now in parseDocuments(...) to it.


public class MultiGetResponse extends ActionResponse implements Iterable<MultiGetItemResponse>, ToXContentObject {

private static final ParseField INDEX = new ParseField(Fields._INDEX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we remove the Fields constants and use ParseField.getPreferredName() where we currently use the Strings (like in toXContent())

String id = null;
ElasticsearchException exception = null;
GetResult getResult = null;
item: for (token = parser.nextToken(); token != Token.END_OBJECT; token = parser.nextToken()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, haven't seen breaks with targets in a while. This might be a bit too sneaky, I think I would like pulling out the item parser for better readability.

String currentFieldName = null;
List<MultiGetItemResponse> items = new ArrayList<>();
for (Token token = parser.nextToken(); token != Token.END_OBJECT; token = parser.nextToken()) {
switch (token) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this switch have a default case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no default case. Normally I would add a default case that throws an exception, but we should be lenient here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can you still add an empty default case, maybe with a comment that we ignore those cases? I get a warning here that could be avoided.

ElasticsearchException exception = null;
GetResult getResult = null;
item: for (token = parser.nextToken(); token != Token.END_OBJECT; token = parser.nextToken()) {
switch (token) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe have a default case? Not sure if it needs to do anything (I think parsing should be lenient for the response parsing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no default case. Normally I would add a default case that throws an exception, but we should be lenient here.

break;
case START_OBJECT:
if (ERROR.match(currentFieldName)) {
exception = ElasticsearchException.fromXContent(parser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be possible/worthwhile to augment GetResult (and its parsing method) with an optional exception field and then only use GetResult.fromXContentEmbedded() for both cases. That would add something to GetResult that is unused in the normal Get-API, but it would simplify things here. I'm on the fence about this idea, maybe @javanna can comment on this too.

@martijnvg
Copy link
Member Author

@cbuescher Thanks for the review! I've updated the PR.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg Hi, thanks for the updates, I left some super minor nits but other than that looks good to me. I don't know if @javanna wants to give this another look though.

String id = null;
ElasticsearchException exception = null;
GetResult getResult = null;
item: for (Token token = parser.nextToken(); token != Token.END_OBJECT; token = parser.nextToken()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for the loop label anymore

String currentFieldName = null;
List<MultiGetItemResponse> items = new ArrayList<>();
for (Token token = parser.nextToken(); token != Token.END_OBJECT; token = parser.nextToken()) {
switch (token) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can you still add an empty default case, maybe with a comment that we ignore those cases? I get a warning here that could be avoided.

ElasticsearchException exception = null;
GetResult getResult = null;
item: for (Token token = parser.nextToken(); token != Token.END_OBJECT; token = parser.nextToken()) {
switch (token) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe empty default wt. comment here as well.

@javanna
Copy link
Member

javanna commented Jan 16, 2018

no need to wait for my final LGTM here, I trust you @cbuescher @martijnvg ! please merge it in, although it has conflicts now unfortunately.

@martijnvg martijnvg force-pushed the hl_client_mget_api branch 2 times, most recently from ec3a2ee to 2b8c1d6 Compare January 16, 2018 14:18
@martijnvg martijnvg merged commit 853f7e8 into elastic:master Jan 16, 2018
@shabtaisharon
Copy link

Is this in a released version? if yes, can you please point me to the documentation?

@cbuescher
Copy link
Member

@shabtaisharon according to the labels on this issue this has been merged to 6.2. However, it seems the documentation was only added in 6.x so far: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/6.x/java-rest-high-document-multi-get.html, but I think that should reflect what is there in 6.2 already.

@shabtaisharon
Copy link

@cbuescher thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants