Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role #9260

Merged
merged 7 commits into from
Nov 15, 2023

Conversation

hussein-awala
Copy link
Member

@hussein-awala hussein-awala commented Jul 21, 2023

Change Logs

This PR:

  • Adds a new AWS credentials provider HoodieConfigAWSAssumedRoleCredentialsProvider to assume an AWS role using the default provider chain
  • Adds the Hudi AWS provider chain to the Glue sync client, where currently it is used only in DynamoDB and CloudWatch, and for Glue we use the default one.

Impact

The user will be able to use the same configuration for Glue, DynamoDB and CloudWatch. And also assume a role using the default provider chain.

Risk level (write none, low medium or high below)

none

Documentation Update

I just updated the doc for Amazon Web Services Configs.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Member Author

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it next week, and add some unit tests

public HoodieConfigAWSAssumedRoleCredentialsProvider(Properties props) {
if (!validConf(props)) {
LOG.debug("AWS role ARN not found in the Hudi configuration.");
throw new IllegalArgumentException("AWS role ARN not found in the Hudi configuration.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could skip raising an exception, and let the method getCredentials return a None when there is no role ARN provided, in this case the client will try with the next provider in the chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to log a debug log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it an error message as it is followed by exception throwing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to be aligned with:

LOG.debug("AWS access key or secret key not found in the Hudi configuration. "

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hussein-awala : Can you make it logger.error in both places ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one will be removed as we discussed here; I will change the log level of the other class later in a separate PR.

@hussein-awala hussein-awala changed the title [WIP] Use the AWS provider chain in Glue sync and add a new provider for STS assume role [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role Aug 15, 2023
@hussein-awala hussein-awala marked this pull request as ready for review August 15, 2023 17:38
@hussein-awala
Copy link
Member Author

@danny0405 could you help to review this PR?

private final StsAssumeRoleCredentialsProvider credentialsProvider;

public HoodieConfigAWSAssumedRoleCredentialsProvider(Properties props) {
if (!validConf(props)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller already guarantee the validConf as always true, so do we still need this branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation is needed to switch to the second provider in the chain, as we do here:

if (StringUtils.isNullOrEmpty(accessKey) || StringUtils.isNullOrEmpty(secretKey)) {
LOG.debug("AWS access key or secret key not found in the Hudi configuration. "
+ "Use default AWS credentials");
} else {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!validConf(props) is always false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not. It checks if the property exists and it's not empty or None:

public static boolean validConf(Properties props) {
String roleArn = props.getProperty(HoodieAWSConfig.AWS_ASSUME_ROLE_ARN.key());
return !StringUtils.isNullOrEmpty(roleArn);
}

If it is not valid, it passes to the next provider in the chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but HoodieConfigAWSAssumedRoleCredentialsProvider is only added when it is valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @danny0405 . As this class is instantiated only through the factory which already checks for validConf, we can skip the check here. @hussein-awala : Do you see any concerns here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked it, and I agree with both of you, I will remove it

@bvaradar bvaradar self-assigned this Nov 14, 2023
@hussein-awala hussein-awala force-pushed the glue_assume_role_provider branch from 0a6a3c2 to e01d9a9 Compare November 14, 2023 23:06
@hussein-awala hussein-awala force-pushed the glue_assume_role_provider branch from e01d9a9 to a3a7ca2 Compare November 14, 2023 23:07
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bvaradar bvaradar merged commit abd3afc into apache:master Nov 15, 2023
30 checks passed
yihua added a commit to yihua/hudi that referenced this pull request Nov 16, 2023
… new provider for STS assume role (apache#9260)"

This reverts commit abd3afc.
@yihua
Copy link
Contributor

yihua commented Nov 16, 2023

@hussein-awala The newly added test TestHoodieAWSCredentialsProviderFactory#testGetAWSCredentialsWithInvalidAssumeRole fails Azure CI on master. Could you take a look (HUDI-7114)?

[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.391 s <<< FAILURE! - in org.apache.hudi.aws.TestHoodieAWSCredentialsProviderFactory
[ERROR] testGetAWSCredentialsWithInvalidAssumeRole  Time elapsed: 0.374 s  <<< ERROR!
software.amazon.awssdk.core.exception.SdkClientException: Unable to load region from any of the providers in the chain software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@5496c165: [software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@499683c4: Unable to load region from system settings. Region must be specified either via environment variable (AWS_REGION) or  system property (aws.region)., software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@e48bf9a: No region provided in profile: default, software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@351f2244: Unable to retrieve region information from EC2 Metadata service. Please make sure the application is running on EC2.]
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.regions.providers.AwsRegionProviderChain.getRegion(AwsRegionProviderChain.java:70)
	at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.regionFromDefaultProvider(AwsDefaultClientBuilder.java:281)
	at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.resolveRegion(AwsDefaultClientBuilder.java:263)
	at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.finalizeChildConfiguration(AwsDefaultClientBuilder.java:184)
	at software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.syncClientConfiguration(SdkDefaultClientBuilder.java:181)
	at software.amazon.awssdk.services.sts.DefaultStsClientBuilder.buildClient(DefaultStsClientBuilder.java:36)
	at software.amazon.awssdk.services.sts.DefaultStsClientBuilder.buildClient(DefaultStsClientBuilder.java:25)
	at software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.build(SdkDefaultClientBuilder.java:148)
	at org.apache.hudi.aws.credentials.HoodieConfigAWSAssumedRoleCredentialsProvider.<init>(HoodieConfigAWSAssumedRoleCredentialsProvider.java:48)
	at org.apache.hudi.aws.credentials.HoodieAWSCredentialsProviderFactory.getAwsCredentialsProviderChain(HoodieAWSCredentialsProviderFactory.java:40)
	at org.apache.hudi.aws.credentials.HoodieAWSCredentialsProviderFactory.getAwsCredentialsProvider(HoodieAWSCredentialsProviderFactory.java:34)
	at org.apache.hudi.aws.TestHoodieAWSCredentialsProviderFactory.testGetAWSCredentialsWithInvalidAssumeRole(TestHoodieAWSCredentialsProviderFactory.java:52)

@hussein-awala
Copy link
Member Author

@hussein-awala The newly added test TestHoodieAWSCredentialsProviderFactory#testGetAWSCredentialsWithInvalidAssumeRole fails Azure CI on master. Could you take a look (HUDI-7114)?

[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.391 s <<< FAILURE! - in org.apache.hudi.aws.TestHoodieAWSCredentialsProviderFactory
[ERROR] testGetAWSCredentialsWithInvalidAssumeRole  Time elapsed: 0.374 s  <<< ERROR!
software.amazon.awssdk.core.exception.SdkClientException: Unable to load region from any of the providers in the chain software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@5496c165: [software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@499683c4: Unable to load region from system settings. Region must be specified either via environment variable (AWS_REGION) or  system property (aws.region)., software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@e48bf9a: No region provided in profile: default, software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@351f2244: Unable to retrieve region information from EC2 Metadata service. Please make sure the application is running on EC2.]
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.regions.providers.AwsRegionProviderChain.getRegion(AwsRegionProviderChain.java:70)
	at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.regionFromDefaultProvider(AwsDefaultClientBuilder.java:281)
	at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.resolveRegion(AwsDefaultClientBuilder.java:263)
	at software.amazon.awssdk.awscore.client.builder.AwsDefaultClientBuilder.finalizeChildConfiguration(AwsDefaultClientBuilder.java:184)
	at software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.syncClientConfiguration(SdkDefaultClientBuilder.java:181)
	at software.amazon.awssdk.services.sts.DefaultStsClientBuilder.buildClient(DefaultStsClientBuilder.java:36)
	at software.amazon.awssdk.services.sts.DefaultStsClientBuilder.buildClient(DefaultStsClientBuilder.java:25)
	at software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.build(SdkDefaultClientBuilder.java:148)
	at org.apache.hudi.aws.credentials.HoodieConfigAWSAssumedRoleCredentialsProvider.<init>(HoodieConfigAWSAssumedRoleCredentialsProvider.java:48)
	at org.apache.hudi.aws.credentials.HoodieAWSCredentialsProviderFactory.getAwsCredentialsProviderChain(HoodieAWSCredentialsProviderFactory.java:40)
	at org.apache.hudi.aws.credentials.HoodieAWSCredentialsProviderFactory.getAwsCredentialsProvider(HoodieAWSCredentialsProviderFactory.java:34)
	at org.apache.hudi.aws.TestHoodieAWSCredentialsProviderFactory.testGetAWSCredentialsWithInvalidAssumeRole(TestHoodieAWSCredentialsProviderFactory.java:52)

I will take a look

jonvex pushed a commit to jonvex/hudi that referenced this pull request Nov 29, 2023
commit dfa3bde
Merge: bfc0a85 473cf9a
Author: Jonathan Vexler <=>
Date:   Wed Nov 29 15:01:45 2023 -0500

    Merge branch 'master' into fg_reader_implement_bootstrap

commit bfc0a85
Author: Jonathan Vexler <=>
Date:   Wed Nov 29 14:55:57 2023 -0500

    fix bug with nested required fields due to spark nested schema pruning bug

commit 473cf9a
Author: Rajesh Mahindra <[email protected]>
Date:   Wed Nov 29 08:37:40 2023 -0800

    [HUDI-7138] Fix error table writer and schema registry provider (apache#10173)

    ---------

    Co-authored-by: rmahindra123 <[email protected]>

commit 91eabab
Author: Lin Liu <[email protected]>
Date:   Tue Nov 28 23:49:37 2023 -0800

    [HUDI-7103] Support time travel queies for COW tables (apache#10109)

    This is based on HadoopFsRelation.

commit b300728
Author: Rajesh Mahindra <[email protected]>
Date:   Tue Nov 28 22:31:12 2023 -0800

    [HUDI-7086] Fix the default for gcp pub sub max sync time to 1min (apache#10171)

    Co-authored-by: rmahindra123 <[email protected]>

commit 8370c62
Author: Shiyan Xu <[email protected]>
Date:   Tue Nov 28 22:31:34 2023 -0600

    [HUDI-7149] Add a dbt example project with CDC capability (apache#10192)

commit 817d81a
Author: zhuanshenbsj1 <[email protected]>
Date:   Wed Nov 29 11:46:20 2023 +0800

    [MINOR] Add log to print wrong number of instant metadata files (apache#10196)

commit cadeade
Author: leixin <[email protected]>
Date:   Wed Nov 29 11:45:24 2023 +0800

    [minor] when metric prefix length is 0 ignore the metric prefix (apache#10190)

    Co-authored-by: leixin1 <[email protected]>

commit 91daa7d
Author: Lin Liu <[email protected]>
Date:   Tue Nov 28 19:03:50 2023 -0800

    [HUDI-7102] Fix bugs related to time travel queries (apache#10102)

commit d1dfa5b
Author: Dongsj <[email protected]>
Date:   Wed Nov 29 10:49:38 2023 +0800

    [HUDI-7148] Add an additional fix to the potential thread insecurity problem of heartbeat client (apache#10188)

    Co-authored-by: dongsj <[email protected]>

commit b0b711e
Author: Jonathan Vexler <=>
Date:   Tue Nov 28 21:35:20 2023 -0500

    nested schema kinda fix

commit 77cfb3a
Author: YueZhang <[email protected]>
Date:   Wed Nov 29 09:46:53 2023 +0800

    [HUDI-7147] Fix CDC write flush bug (apache#10186)

    * Using iterator instead of values to avoid unsupported operation exception

    * check style

commit b144ee0
Author: Jon Vexler <[email protected]>
Date:   Tue Nov 28 14:23:46 2023 -0500

    Update hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala

    Co-authored-by: Sagar Sumit <[email protected]>

commit 89fab14
Author: Jonathan Vexler <=>
Date:   Tue Nov 28 14:23:03 2023 -0500

    fix failing tests and address some of sagar pr review

commit 675abf1
Author: Tim Brown <[email protected]>
Date:   Mon Nov 27 23:21:56 2023 -0600

    [MINOR] Schema Converter should use default identity transform if not specified (apache#10178)

commit 5450aff
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 22:21:06 2023 -0500

    disable vector for bootstrap

commit fb062df
Author: Danny Chan <[email protected]>
Date:   Tue Nov 28 10:52:33 2023 +0800

    [Minor] Fix the flaky tests in TestRemoteHoodieTableFileSystemView (apache#10179)

commit 3ae4d30
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 21:07:17 2023 -0500

    fix various issues that caused failing tests

commit a045da6
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 18:00:46 2023 -0500

    see if this works

commit 91be81a
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 17:07:30 2023 -0500

    use java to create unary operator

commit c22d1db
Merge: 38b2603 4c3a1db
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 15:56:39 2023 -0500

    Merge branch 'master' into fg_reader_implement_bootstrap

commit 38b2603
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 15:42:22 2023 -0500

    set precombine in test

commit 2a9a363
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 13:27:38 2023 -0500

    try to fix scala2.11 unary operator issue

commit 60bdf14
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 13:02:16 2023 -0500

    try fix ci

commit 4c3a1db
Author: majian <[email protected]>
Date:   Mon Nov 27 16:44:25 2023 +0800

    [HUDI-7110][FOLLOW-UP] Improve call procedure for show column stats information (apache#10169)

commit 499423c
Author: zhuanshenbsj1 <[email protected]>
Date:   Sun Nov 26 10:13:46 2023 +0800

    [HUDI-7041] Optimize the memory usage of timeline server for table service (apache#10002)

commit 4f875ed
Author: Y Ethan Guo <[email protected]>
Date:   Sat Nov 25 15:10:37 2023 -0800

    [HUDI-7139] Fix operation type for bulk insert with row writer in Hudi Streamer (apache#10175)

    This commit fixes the bug which causes the `operationType` to be null in the commit metadata of bulk insert operation with row writer enabled in Hudi Streamer (`hoodie.datasource.write.row.writer.enable=true`).  `HoodieStreamerDatasetBulkInsertCommitActionExecutor` is updated so that `#preExecute` and `#afterExecute` should run the same logic as regular bulk insert operation without row writer.

commit 332e7e8
Author: harshal <[email protected]>
Date:   Sat Nov 25 14:04:29 2023 +0530

    [HUDI-7006] Reduce unnecessary is_empty rdd calls in StreamSync (apache#10158)

    ---------

    Co-authored-by: sivabalan <[email protected]>

commit 86232d2
Author: Sivabalan Narayanan <[email protected]>
Date:   Thu Nov 23 19:27:50 2023 -0800

    [HUDI-7095] Making perf enhancements to JSON serde (apache#10097)

commit a7fd27c
Author: Sivabalan Narayanan <[email protected]>
Date:   Thu Nov 23 19:20:01 2023 -0800

    [HUDI-7086] Scaling gcs event source (apache#10073)

    -  Scaling gcs event source

    ---------

    Co-authored-by: rmahindra123 <[email protected]>

commit bb42c4b
Author: Sivabalan Narayanan <[email protected]>
Date:   Thu Nov 23 18:33:32 2023 -0800

    [HUDI-7097] Fix instantiation of Hms Uri with HiveSync tool (apache#10099)

commit 0b7f47a
Author: Jonathan Vexler <=>
Date:   Thu Nov 23 16:27:36 2023 -0500

    decently working

commit bcb974b
Author: VitoMakarevich <[email protected]>
Date:   Thu Nov 23 11:22:14 2023 +0100

    [HUDI-7034] Fix refresh table/view (apache#10151)

    * [HUDI-7034] Refresh index fix - remove cached file slices within partitions

    ---------

    Co-authored-by: vmakarevich <[email protected]>
    Co-authored-by: Sagar Sumit <[email protected]>

commit b77eff2
Author: Lokesh Jain <[email protected]>
Date:   Thu Nov 23 10:47:40 2023 +0530

    [HUDI-7120] Performance improvements in deltastreamer executor code path (apache#10135)

commit 405be17
Author: Sivabalan Narayanan <[email protected]>
Date:   Wed Nov 22 21:00:33 2023 -0800

    [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) (apache#10095)

    * Making misc fixes to deltastreamer sources

    * Fixing test failures

    * adding inference to CloudSourceconfig... cloud.data.datafile.format

    * Fix the tests for s3 events source

    * Fix the tests for s3 events source

    ---------

    Co-authored-by: rmahindra123 <[email protected]>

commit 3d21285
Author: Tim Brown <[email protected]>
Date:   Wed Nov 22 22:51:14 2023 -0600

    [HUDI-7112] Reuse existing timeline server and performance improvements (apache#10122)

    - Reuse timeline server across tables.

    ---------

    Co-authored-by: sivabalan <[email protected]>

commit 72ff9a7
Author: Rajesh Mahindra <[email protected]>
Date:   Wed Nov 22 20:49:15 2023 -0800

    [HUDI-7052] Fix partition key validation for custom key generators. (apache#10014)

    ---------

    Co-authored-by: rmahindra123 <[email protected]>

commit 8d6d043
Author: majian <[email protected]>
Date:   Thu Nov 23 10:08:17 2023 +0800

    [HUDI-7110] Add call procedure for show column stats information (apache#10120)

commit aabaa99
Author: huangxiaoping <[email protected]>
Date:   Thu Nov 23 09:06:45 2023 +0800

    [MINOR] Remove unused import (apache#10159)

commit f88a73f
Author: Y Ethan Guo <[email protected]>
Date:   Wed Nov 22 10:48:48 2023 -0800

    [HUDI-7123] Improve CI scripts (apache#10136)

    Improves the CI scripts in the following aspects:
    - Removes `hudi-common` tests from `test-spark` job in GH CI as they are already covered by Azure CI
    - Removes unnecesary bundle validation jobs and adds new bundle validation images (`flink1153hive313spark323`, `flink1162hive313spark331`)
    - Updates `validate-release-candidate-bundles` jobs
    - Moves functional tests of `hudi-spark-datasource/hudi-spark` from job 4 (3 hours) to job 2 (1 hour) in Azure CI to rebalance the finish time.

commit 38c87b7
Author: harshal <[email protected]>
Date:   Wed Nov 22 20:53:42 2023 +0530

    [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources (apache#10152)

commit d0edfb5
Author: Sivabalan Narayanan <[email protected]>
Date:   Wed Nov 22 10:22:53 2023 -0500

    [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker (apache#10150)

    - Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custom delete marker across all delete apis

commit cda9dbc
Author: Jing Zhang <[email protected]>
Date:   Wed Nov 22 18:04:39 2023 +0800

    [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure (apache#10147)

commit 18f7181
Author: Shiyan Xu <[email protected]>
Date:   Wed Nov 22 02:00:27 2023 -0600

    [HUDI-7133] Improve dbt example for better guidance (apache#10155)

commit c5af85d
Author: Sivabalan Narayanan <[email protected]>
Date:   Wed Nov 22 01:33:49 2023 -0500

    [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata (apache#10098)

commit 2522f6d
Author: xuzifu666 <[email protected]>
Date:   Wed Nov 22 11:53:21 2023 +0800

    [HUDI-7128] DeleteMarkerProcedures support delete in batch mode (apache#10148)

    Co-authored-by: xuyu <[email protected]>

commit a1afcdd
Author: Tim Brown <[email protected]>
Date:   Tue Nov 21 14:58:12 2023 -0600

    [HUDI-7115] Add in new options for the bigquery sync (apache#10125)

    - Add in new options for the bigquery sync

commit 35cd873
Author: Sivabalan Narayanan <[email protected]>
Date:   Tue Nov 21 13:11:21 2023 -0500

    [HUDI-7084] Fixing schema retrieval for table w/ no commits (apache#10069)

    * Fixing schema retrieval for table w/ no commits

    * fixing compilation failure

commit 74793d5
Author: Rajesh Mahindra <[email protected]>
Date:   Tue Nov 21 09:53:12 2023 -0800

    [HUDI-7106] Fix sqs deletes, deltasync service close and error table default configs. (apache#10117)

    Co-authored-by: rmahindra123 <[email protected]>

commit b981877
Author: harshal <[email protected]>
Date:   Tue Nov 21 22:52:28 2023 +0530

    [HUDI-7003] Add option to fallback to full table scan if files are deleted due to cleaner (apache#9941)

commit 600fd4d
Author: Akira Ajisaka <[email protected]>
Date:   Wed Nov 22 01:24:37 2023 +0900

    [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format (apache#9567)

    * [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format

    This reverts commit 2567ada.

     Conflicts:
    	hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
    	hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergeOnReadTableInputFormat.java

    * Always use file index if files partition is available

    ---------

    Co-authored-by: Sagar Sumit <[email protected]>

commit 9e2500c
Author: Sivabalan Narayanan <[email protected]>
Date:   Tue Nov 21 09:55:23 2023 -0500

    [HUDI-7083] Adding support for multiple tables with Prometheus Reporter (apache#10068)

    * Adding support for multiple tables with Prometheus Reporter

    * Fixing closure of http server

    * Remove entry from port-collector registry map after stopping http server

    ---------

    Co-authored-by: Sagar Sumit <[email protected]>

commit baffe1d
Author: Sivabalan Narayanan <[email protected]>
Date:   Tue Nov 21 09:32:39 2023 -0500

    [MINOR] Misc fixes in deltastreamer (apache#10067)

commit 0c4f3a3
Author: Sivabalan Narayanan <[email protected]>
Date:   Tue Nov 21 02:17:13 2023 -0500

    [HUDI-7127] Fixing set up and tear down in tests (apache#10146)

commit eaba114
Author: Akira Ajisaka <[email protected]>
Date:   Tue Nov 21 11:37:47 2023 +0900

    [HUDI-7107] Reused MetricsReporter fails to publish metrics in Spark streaming job (apache#10132)

commit 578e756
Author: Jing Zhang <[email protected]>
Date:   Tue Nov 21 10:04:33 2023 +0800

    [HUDI-7118] Set conf 'spark.sql.parquet.enableVectorizedReader' to true automatically only if the value is not explicitly set (apache#10134)

commit d24220a
Author: Jing Zhang <[email protected]>
Date:   Tue Nov 21 09:56:07 2023 +0800

    [HUDI-7111] Fix performance regression of tag when written into simple bucket index table (apache#10130)

commit 84990ae
Author: Rajesh Mahindra <[email protected]>
Date:   Mon Nov 20 11:17:45 2023 -0800

    Fix schema refresh for KafkaAvroSchemaDeserializer (apache#10118)

    Co-authored-by: rmahindra123 <[email protected]>

commit 979132b
Author: majian <[email protected]>
Date:   Mon Nov 20 10:43:11 2023 +0800

    [HUDI-7099] Providing metrics for archive and defining some string constants (apache#10101)

commit 3225625
Author: Fabio Buso <[email protected]>
Date:   Mon Nov 20 03:19:41 2023 +0100

    [MINOR] Add Hopsworks File System to StorageSchemes (apache#10141)

commit 3913dca
Author: Sivabalan Narayanan <[email protected]>
Date:   Sat Nov 18 23:50:37 2023 -0500

    [HUDI-7098] Add max bytes per partition with cloud stores source in DS (apache#10100)

commit 4c295b2
Author: hehuiyuan <[email protected]>
Date:   Sun Nov 19 09:43:52 2023 +0800

    [HUDI-7119] Don't write precombine field to hoodie.properties when the ts field does not exist for append mode (apache#10133)

commit b2f4493
Author: Jing Zhang <[email protected]>
Date:   Sun Nov 19 09:35:54 2023 +0800

    [HUDI-7072] Remove support for Flink 1.13 (apache#10052)

commit dfe1674
Author: Sagar Lakshmipathy <[email protected]>
Date:   Fri Nov 17 18:43:07 2023 -0800

    [Minor] Fixed twitter link to redirect to twitter (apache#10139)

commit f58d9cb
Author: Jonathan Vexler <=>
Date:   Fri Nov 17 18:10:00 2023 -0500

    current point

commit 184858b
Author: Jonathan Vexler <=>
Date:   Fri Nov 17 16:21:56 2023 -0500

    non-working. Want to review with team that this makes sense

commit 8240b6a
Author: Y Ethan Guo <[email protected]>
Date:   Fri Nov 17 11:20:57 2023 -0800

    [HUDI-7113] Update release scripts and docs for Spark 3.5 support (apache#10123)

commit 216aeb4
Author: Danny Chan <[email protected]>
Date:   Fri Nov 17 14:35:17 2023 +0800

    [HUDI-7116] Add docker image for flink 1.14 and spark 2.4.8 (apache#10126)

commit 3d0c450
Author: YueZhang <[email protected]>
Date:   Fri Nov 17 09:48:59 2023 +0800

    [HUDI-7109] Fix Flink may re-use a committed instant in append mode (apache#10119)

commit f06ff5b
Author: hehuiyuan <[email protected]>
Date:   Fri Nov 17 09:43:21 2023 +0800

    [HUDI-7090] Set the maxParallelism for singleton operator  (apache#10090)

commit faa73e9
Author: Y Ethan Guo <[email protected]>
Date:   Thu Nov 16 12:12:22 2023 -0800

    [MINOR] Disable failed test on master (apache#10124)

commit 6cc39bf
Author: Sivabalan Narayanan <[email protected]>
Date:   Thu Nov 16 06:00:54 2023 -0500

    [MINOR] Removing unnecessary guards to row writer (apache#10004)

commit 4ea752f
Author: voonhous <[email protected]>
Date:   Thu Nov 16 16:53:28 2023 +0800

    [MINOR] Modified description to include missing trigger strategy (apache#10114)

commit 874b5de
Author: Shawn Chang <[email protected]>
Date:   Wed Nov 15 21:57:14 2023 -0800

    [HUDI-6806] Support Spark 3.5.0 (apache#9717)

    ---------

    Co-authored-by: Shawn Chang <[email protected]>
    Co-authored-by: Y Ethan Guo <[email protected]>

commit 35af64d
Author: Shawn Chang <[email protected]>
Date:   Wed Nov 15 18:36:42 2023 -0800

    [Minor] Throw exceptions when cleaner/compactor fail (apache#10108)

    Co-authored-by: Shawn Chang <[email protected]>

commit bada5d9
Author: Shawn Chang <[email protected]>
Date:   Wed Nov 15 16:50:38 2023 -0800

    [HUDI-5936] Fix serialization problem when FileStatus is not serializable (apache#10065)

    Co-authored-by: Shawn Chang <[email protected]>

commit dcd5a81
Author: majian <[email protected]>
Date:   Wed Nov 15 16:10:15 2023 +0800

    [HUDI-7069] Optimize metaclient construction and include table config options (apache#10048)

commit f218e54
Author: Jing Zhang <[email protected]>
Date:   Wed Nov 15 16:07:04 2023 +0800

    [MINOR] Add detailed error logs in RunCompactionProcedure (apache#10070)

    * add detailed error logs in RunCompactionProcedure
    * only print 100 error file paths into logs

commit 2185abb
Author: Jing Zhang <[email protected]>
Date:   Wed Nov 15 16:03:23 2023 +0800

    [HUDI-7094] AlterTableAddColumnCommand/AlterTableChangeColumnCommand update table with ro/rt suffix (apache#10094)

commit abd3afc
Author: Hussein Awala <[email protected]>
Date:   Wed Nov 15 06:55:47 2023 +0200

    [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role (apache#9260)

commit 424e0ce
Author: chao chen <[email protected]>
Date:   Wed Nov 15 12:20:10 2023 +0800

    [HUDI-7050] Flink HoodieHiveCatalog supports hadoop parameters (apache#10013)

commit 19b3e7f
Author: leixin <[email protected]>
Date:   Wed Nov 15 09:24:29 2023 +0800

    [Minor] Throws an exception when using bulk_insert and stream mode (apache#10082)

    Co-authored-by: leixin1 <[email protected]>
parisni pushed a commit to leboncoin/hudi that referenced this pull request Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

5 participants