Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLRC API for _termvectors #33447

Conversation

mayya-sharipova
Copy link
Contributor

relates to #27205

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@mayya-sharipova mayya-sharipova force-pushed the rest-highlevel-term-vectors-new-response-format branch from 76c90c5 to 2272642 Compare September 6, 2018 15:51
if (docBuilder != null) {
BytesReference doc = BytesReference.bytes(docBuilder);
try (InputStream stream = doc.streamInput()) {
builder.rawField("doc", stream, docBuilder.contentType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the name of this method! It makes me think we're copying the field raw without checking the content type but we are checking it.

private final long docVersion;
private final boolean found;
private final long tookInMillis;
private List<TermVector> termVectorList = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird to have just this part be mutable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for the ConstructingObjectParser to pass in a NamedObjects constructorArg? Right now on line 67 is where its being mutated.

public Term() {};

public static Term fromXContent(XContentParser parser, String term) {
Term t = TERMPARSER.apply(parser, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put TERMPARSER into this class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same for the other parsers, I think.

private float score = -1f;
private List<Token> tokens = null;

public Term() {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one mutable when the others are immutable? Maybe make all immutable?

Copy link
Contributor

@hub-cap hub-cap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work thus far. Had some comments / nits to address. It looks mostly done tho.

@@ -1122,6 +1123,26 @@ static Request analyze(AnalyzeRequest request) throws IOException {
return req;
}

static Request termVectors(TermVectorsRequest tvrequest) throws IOException {
String endpoint = "_termvectors";
if (tvrequest.getId() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the EndpointBuilder() gladly accepts nulls in its addPathPart(...) method. You dont need to worry about passing a null value in, as it just wont put it in the path.

}
Request request = new Request(HttpGet.METHOD_NAME, endpoint);
Params params = new Params(request);
if (tvrequest.getRouting() != null) params.withRouting(tvrequest.getRouting());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the params methods all call putParam and that does a null check, and will not add it if its null. So no need to check here.


if (filterSettings != null) {
builder.startObject("filter");
if (filterSettings.containsKey("max_num_terms")) builder.field("max_num_terms", filterSettings.get("max_num_terms"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a helper method here to reduce the copy/paste

private final long docVersion;
private final boolean found;
private final long tookInMillis;
private List<TermVector> termVectorList = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for the ConstructingObjectParser to pass in a NamedObjects constructorArg? Right now on line 67 is where its being mutated.

}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need this

import java.util.function.Predicate;


public class TermVectorsResponseTests extends AbstractXContentTestCase<TermVectorsResponse> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when removing the ToXContent, you will have to manually create a parser to test. This could be nasty on such a large class hierarchy. Im sorry, but its the best we have now. We need to come up with a better way. You should be able to move the code out of the response and just use it in a test.

@@ -61,7 +61,7 @@ public DeleteByQueryRequest(String... indices) {
this(new SearchRequest(indices));
}

DeleteByQueryRequest(SearchRequest search) {
public DeleteByQueryRequest(SearchRequest search) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended? I dont see any uses of it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably drop this, yeah.

- use ConstructorObjectParser to make object fields final
@rjernst rjernst removed the review label Oct 10, 2018
- move toXContent section from TermVectorsResponse to
    TermVectorsResponseTests
- refactors docs to use concise version
@mayya-sharipova
Copy link
Contributor Author

@hub-cap @nik9000 I have tried to address your comments, can you please continue the review.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of minor stuff but I think it is pretty much done!

this.id = docId;
}

public String getIndex() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put these below the other ctor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe add javadoc for them like you did with the other methods?

String[] filterSettingNames =
{"max_num_terms", "min_term_freq", "max_term_freq", "min_doc_freq", "max_doc_freq", "min_word_length", "max_word_length"};
for (String settingName : filterSettingNames) {
if (filterSettings.containsKey(settingName)) builder.field(settingName, filterSettings.get(settingName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be a class rather than a free form map? It looks like this'd silently not serialize some stuff and that seems bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these parameters are optional, and a request doesn't require any of them, that is why I think it is fine to have it as a map.

private final List<TermVector> termVectorList;

public TermVectorsResponse(
String index, String type, String id, long version, boolean found, long tookInMillis, List<TermVector> termVectorList) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you indent this one more time so the parameters don't line up with the body?

this.found = found;
this.tookInMillis = tookInMillis;
if (termVectorList != null) {
Collections.sort(termVectorList, Comparator.comparing(TermVector::getFieldName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably sort in the parser rather than the ctor. It just feels a little less surprising.


@SuppressWarnings("unchecked")
private static ConstructingObjectParser<TermVectorsResponse, Void> PARSER = new ConstructingObjectParser<>(
"term_vectors", true, args -> new TermVectorsResponse((String) args[0], (String) args[1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably make this a block with a return statement because that'd make it a bit easier to read. I think.

return totalTermFreq;
}

public Float getScore(){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moar javadoc!

}

private void toXContent(TermVectorsResponse response, XContentBuilder builder) throws IOException {
builder.startObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny indentation

@@ -1503,6 +1507,120 @@ public void afterBulk(long executionId, BulkRequest request, Throwable failure)
}
}

public void testTermVectors() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getting the term vectors is part of CRUD. I don't know what it is, but I don't think it is CRUD. I guess if there isn't any better place, here is good. But maybe leave a comment?

int position = token.getPosition(); // <18>
int startOffset = token.getStartOffset(); // <19>
int endOffset = token.getEndOffset(); // <20>
String payload = token.getPayload(); // <21>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that is a lot of callouts! Maybe it'd be easier to read the docs if you split it? I'm not sure how that'd look, but 21 callouts is going to be super hard to read!

@@ -61,7 +61,7 @@ public DeleteByQueryRequest(String... indices) {
this(new SearchRequest(indices));
}

DeleteByQueryRequest(SearchRequest search) {
public DeleteByQueryRequest(SearchRequest search) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably drop this, yeah.

@mayya-sharipova
Copy link
Contributor Author

@nik9000 Thanks for such a detailed feedback, Nik. I have addressed all your comments, can you please continue the review.

The only comment I have not addressed is to make filterSettings a class in the TermVectorsRequest. This is because as all the parameters in filterSettings are optional, and a request doesn't require any of them, I think it is fine to have filterSettings as a map.

Copy link
Contributor

@hub-cap hub-cap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one super minor nit. also, wow you picked a giant client call!! Thank you for taking the time to put the new classes into the core folder in client instead of using the existing server ones. Itll be one less thing to do to decouple. <3 <3

* @param docId - id of the document
*/
public TermVectorsRequest(String index, String type, String docId) {
this.index = index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this(index, type);
this.id = docId;

Sort lists for equality in parsers
@mayya-sharipova
Copy link
Contributor Author

@hub-cap thank you for the review.

@nik9000 Do you still want to make the final round of review, or I can merge this PR?

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mayya-sharipova mayya-sharipova merged commit bf4d90a into elastic:master Oct 24, 2018
@mayya-sharipova mayya-sharipova deleted the rest-highlevel-term-vectors-new-response-format branch October 24, 2018 18:27
mayya-sharipova added a commit that referenced this pull request Oct 24, 2018
mayya-sharipova added a commit that referenced this pull request Oct 24, 2018
mayya-sharipova added a commit that referenced this pull request Oct 24, 2018
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Oct 26, 2018
mayya-sharipova added a commit that referenced this pull request Oct 27, 2018
kcm pushed a commit that referenced this pull request Oct 30, 2018
* HLRC API for _termvectors

relates to #27205
kcm pushed a commit that referenced this pull request Oct 30, 2018
kcm pushed a commit that referenced this pull request Oct 30, 2018
kcm pushed a commit that referenced this pull request Oct 30, 2018
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Nov 2, 2018
mayya-sharipova added a commit that referenced this pull request Nov 4, 2018
mayya-sharipova added a commit that referenced this pull request Nov 5, 2018
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants