Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4443][SQL] Fix statistics for external table in spark sql hive #3304

Closed
wants to merge 1 commit into from

Conversation

scwf
Copy link
Contributor

@scwf scwf commented Nov 17, 2014

The totalSize of external table is always zero, which will influence join strategy(always use broadcast join for external table).

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23464/
Test FAILed.

@scwf
Copy link
Contributor Author

scwf commented Nov 17, 2014

retest it please.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23475 has started for PR 3304 at commit 568f321.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23475 has finished for PR 3304 at commit 568f321.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23475/
Test PASSed.

@marmbrus
Copy link
Contributor

Good catch! Merged to master and 1.2

@asfgit asfgit closed this in 42389b1 Nov 18, 2014
asfgit pushed a commit that referenced this pull request Nov 18, 2014
The `totalSize` of external table  is always zero, which will influence join strategy(always use broadcast join for external table).

Author: w00228970 <[email protected]>

Closes #3304 from scwf/statistics and squashes the following commits:

568f321 [w00228970] fix statistics for external table

(cherry picked from commit 42389b1)
Signed-off-by: Michael Armbrust <[email protected]>
@scwf scwf deleted the statistics branch November 18, 2014 00:54
yaooqinn pushed a commit that referenced this pull request Aug 26, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue #3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue #4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue #3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue #3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue #4075](h2database/h2database#4075): infinite loop in compact
    - [Issue #4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue #4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR #3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR #3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR #3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR #3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR #3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR #3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR #3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue #3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR #3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue #3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR #3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR #2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR #2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR #2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue apache#3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue apache#4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue apache#3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue apache#3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue apache#4075](h2database/h2database#4075): infinite loop in compact
    - [Issue apache#4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue apache#4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR apache#3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR apache#3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR apache#3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR apache#3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR apache#3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR apache#3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR apache#3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue apache#3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR apache#3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue apache#3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR apache#3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR apache#2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR apache#2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR apache#2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue apache#3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue apache#4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue apache#3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue apache#3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue apache#4075](h2database/h2database#4075): infinite loop in compact
    - [Issue apache#4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue apache#4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR apache#3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR apache#3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR apache#3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR apache#3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR apache#3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR apache#3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR apache#3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue apache#3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR apache#3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue apache#3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR apache#3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR apache#2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR apache#2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR apache#2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue apache#3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue apache#4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue apache#3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue apache#3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue apache#4075](h2database/h2database#4075): infinite loop in compact
    - [Issue apache#4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue apache#4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR apache#3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR apache#3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR apache#3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR apache#3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR apache#3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR apache#3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR apache#3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue apache#3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR apache#3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue apache#3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR apache#3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR apache#2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR apache#2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR apache#2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants