-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Connector-V2] HTTP supports page increase #5477 #5561
[Feature][Connector-V2] HTTP supports page increase #5477 #5561
Conversation
LICENSE
Outdated
seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/sqlengine/zeta/ZetaSQLType.java from https://github.com/JSQLParser/JSqlParser | ||
seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/sqlengine/zeta/ZetaSQLFilter.java from https://github.com/JSQLParser/JSqlParser | ||
seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/sqlengine/zeta/ZetaSQLFunction.java from https://github.com/JSQLParser/JSqlParser | ||
MIT License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/en/connector-v2/source/Http.md
Outdated
| json_field | Config | No | - | This parameter helps you configure the schema,so this parameter must be used with schema. | | ||
| pageing | Config | No | - | This parameter is used for paging queries | | ||
| pageing.page_field | String | No | - | This parameter is used to specify the page field name in the request parameter | | ||
| *pageing.max_page_size* | Int | No | 10000 | Change the parameter to control the maximum number of pages (if using paging please ensure that this value is greater than the target number of pages otherwise it may cause early *data accuracy* problems) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| *pageing.max_page_size* | Int | No | 10000 | Change the parameter to control the maximum number of pages (if using paging please ensure that this value is greater than the target number of pages otherwise it may cause early *data accuracy* problems) | | |
| pageing.max_page_size | Int | No | 10000 | Change the parameter to control the maximum number of pages (if using paging please ensure that this value is greater than the target number of pages otherwise it may cause early *data accuracy* problems) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove *
on pageing.max_page_size
docs/en/connector-v2/source/Http.md
Outdated
| pageing.json_verify_expression | String | No | - | This parameter is used verify that the condition is met through json path | | ||
| pageing.json_verify_value | String | No | - | This parameter must be configured after json_verify_expression is configured (make sure verify value is in this parameter). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the code you use these two config, and I don't know what's used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two parameters are primarily used in cases where there is no total page count field in the API response to determine the number of iterations. They allow you to pinpoint a field using a JSON path and then decide whether to continue the task based on whether a specific condition is met (typically, whether the target value is found in json_verify_value).
@xiaofan2022 Also please add e2e test case for this new feature, you can refer https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test. |
LICENSE
Outdated
@@ -178,15 +178,15 @@ | |||
APPENDIX: How to apply the Apache License to your work. | |||
|
|||
To apply the Apache License to your work, attach the following | |||
boilerplate notice, with the fields enclosed by brackets "{}" | |||
boilerplate notice, with the fields enclosed by brackets "[]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this file change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Options.key("json_verify_expression") | ||
.stringType() | ||
.noDefaultValue() | ||
.withDescription("json verify expression "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description not clear, I don't know what's mean if I only read json verify expression
.
docs/en/connector-v2/source/Http.md
Outdated
| json_field | Config | No | - | This parameter helps you configure the schema,so this parameter must be used with schema. | | ||
| pageing | Config | No | - | This parameter is used for paging queries | | ||
| pageing.page_field | String | No | - | This parameter is used to specify the page field name in the request parameter | | ||
| *pageing.max_page_size* | Int | No | 10000 | Change the parameter to control the maximum number of pages (if using paging please ensure that this value is greater than the target number of pages otherwise it may cause early *data accuracy* problems) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove *
on pageing.max_page_size
Options.key("json_verify_value") | ||
.stringType() | ||
.noDefaultValue() | ||
.withDescription("json verify value "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- json_verify_expression: retrieve and validate target field values using a json_path expression
- json_verify_value: verify if the value of the JSON verify expression matches the expected value.
- max_page_size: The "pageing.max_page_size" parameter is primarily used to prevent infinite loops when there is no configuration for validation through JSONPath, ensuring that the program can complete its execution even if the termination condition is not triggered.
Options.key("page_field") | ||
.stringType() | ||
.defaultValue("page") | ||
.withDescription("page field"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. I think you should copy description from doc to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
json_field = { | ||
name = "$.data[*].name" | ||
age = "$.data[*].age" | ||
} | ||
pageing={ | ||
total_page_size=2 | ||
page_field=page | ||
} | ||
schema = { | ||
fields { | ||
name = string | ||
age = int | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
json_field = { | ||
name = "$.data[*].name" | ||
age = "$.data[*].age" | ||
} | ||
pageing={ | ||
json_verify_expression="$.hasNext" | ||
json_verify_value="false" | ||
page_field=page | ||
} | ||
schema = { | ||
fields { | ||
name = string | ||
age = int | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"age": 51 | ||
} | ||
], | ||
"currentPageIndex": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"currentPageIndex": 1, | |
"currentPageIndex": 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
After a second consider, I'm a little worry about the scene this PR try to solved. Normally, we can try to use batch size to check we can continue read next page or not.
Could you support this for default way to check read next page or not? I think this way will more useful and simple. |
if read size>100? next batch? |
yes, but if |
|
It would not care the page size, imagine that if the total amount of data returned by the API keeps rising, the pagesize size may need to be changed every time a task is run, but by specifying the batchsize, we can always read without changing the job file to read all data. |
I did not change the Engine-Client related code, why ci/cd error |
Never mind, this is a bug will be fixed in #5710 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one comment
this.httpParameter | ||
.getParams() | ||
.put(pageInfo.getPageField(), pageInfo.getPageIndex().toString()); | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/en/connector-v2/source/Http.md
Outdated
@@ -310,6 +314,35 @@ source { | |||
- Test data can be found at this link [mockserver-config.json](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/mockserver-config.json) | |||
- See this link for task configuration [http_jsonpath_to_assert.conf](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/http_jsonpath_to_assert.conf). | |||
|
|||
### pageing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### pageing | |
### pageing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xiaofan2022 !
Purpose of this pull request
http supports page increase #5477
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
release-note
.