Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23275][SQL] hive/tests have been failing when run locally on the laptop (Mac) with OOM #20441

Closed
wants to merge 1 commit into from

Conversation

dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

hive tests have been failing when they are run locally (Mac Os) after a recent change in the trunk. After running the tests for some time, the test fails with OOM with Error: unable to create new native thread.

I noticed the thread count goes all the way up to 2000+ after which we start getting these OOM errors. Most of the threads seem to be related to the connection pool in hive metastore (BoneCP-xxxxx-xxxx ). This behaviour change is happening after we made the following change to HiveClientImpl.reset()

 def reset(): Unit = withHiveState {
    try {
      // code
    } finally {
      runSqlHive("USE default")  ===> this is causing the issue
    }

I am proposing to temporarily back-out part of a fix made to address SPARK-23000 to resolve this issue while we work-out the exact reason for this sudden increase in thread counts.

How was this patch tested?

Ran hive/test multiple times in different machines.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

@dilipbiswal dilipbiswal changed the title [SPARK-23275] hive/tests have been failing when run locally on the laptop (Mac) with OOM [SPARK-23275][SQL] hive/tests have been failing when run locally on the laptop (Mac) with OOM Jan 30, 2018
@gatorsmile
Copy link
Member

@liufengdb You might be interested in this

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86844 has finished for PR 20441 at commit 983aa18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test only PR. This is an interesting finding. We can dig it deeper in the future.

Thanks for reporting this issue.

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master/2.3

asfgit pushed a commit that referenced this pull request Jan 30, 2018
…he laptop (Mac) with OOM

## What changes were proposed in this pull request?
hive tests have been failing when they are run locally (Mac Os) after a recent change in the trunk. After running the tests for some time, the test fails with OOM with Error: unable to create new native thread.

I noticed the thread count goes all the way up to 2000+ after which we start getting these OOM errors. Most of the threads seem to be related to the connection pool in hive metastore (BoneCP-xxxxx-xxxx ). This behaviour change is happening after we made the following change to HiveClientImpl.reset()

``` SQL
 def reset(): Unit = withHiveState {
    try {
      // code
    } finally {
      runSqlHive("USE default")  ===> this is causing the issue
    }
```
I am proposing to temporarily back-out part of a fix made to address SPARK-23000 to resolve this issue while we work-out the exact reason for this sudden increase in thread counts.

## How was this patch tested?
Ran hive/test multiple times in different machines.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Dilip Biswal <[email protected]>

Closes #20441 from dilipbiswal/hive_tests.

(cherry picked from commit 58fcb5a)
Signed-off-by: gatorsmile <[email protected]>
@dilipbiswal
Copy link
Contributor Author

Many thanks @gatorsmile .

@asfgit asfgit closed this in 58fcb5a Jan 30, 2018
@srowen
Copy link
Member

srowen commented Jan 31, 2018

For the record, what was the error? I got this running on OS X from master with -Phive before this change, but was that same problem?

HiveWindowFunctionQuerySuite:
*** RUN ABORTED ***
  org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table part. Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;

...

Exception in thread "block-manager-slave-async-thread-pool-23" java.lang.OutOfMemoryError: unable to create new native thread

@dilipbiswal
Copy link
Contributor Author

@srowen Hello, yeah, i saw the same error. Quite a few errors like

java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

with underlying reason

Caused by: java.lang.OutOfMemoryError: unable to create new native thread

@liufengdb
Copy link

@gatorsmile sorry for the late reply. I think the root cause is in hive metastore. I created one pr to bypass it: #20562

asfgit pushed a commit that referenced this pull request Feb 10, 2018
## What changes were proposed in this pull request?

This is a follow up of #20441.

The two lines actually can trigger the hive metastore bug: https://issues.apache.org/jira/browse/HIVE-16844

The two configs are not in the default `ObjectStore` properties, so any run hive commands after these two lines will set the `propsChanged` flag in the `ObjectStore.setConf` and then cause thread leaks.

I don't think the two lines are very useful. They can be removed safely.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Feng Liu <[email protected]>

Closes #20562 from liufengdb/fix-omm.

(cherry picked from commit 6d7c383)
Signed-off-by: gatorsmile <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Feb 10, 2018
## What changes were proposed in this pull request?

This is a follow up of apache#20441.

The two lines actually can trigger the hive metastore bug: https://issues.apache.org/jira/browse/HIVE-16844

The two configs are not in the default `ObjectStore` properties, so any run hive commands after these two lines will set the `propsChanged` flag in the `ObjectStore.setConf` and then cause thread leaks.

I don't think the two lines are very useful. They can be removed safely.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Feng Liu <[email protected]>

Closes apache#20562 from liufengdb/fix-omm.
robert3005 pushed a commit to palantir/spark that referenced this pull request Feb 12, 2018
## What changes were proposed in this pull request?

This is a follow up of apache#20441.

The two lines actually can trigger the hive metastore bug: https://issues.apache.org/jira/browse/HIVE-16844

The two configs are not in the default `ObjectStore` properties, so any run hive commands after these two lines will set the `propsChanged` flag in the `ObjectStore.setConf` and then cause thread leaks.

I don't think the two lines are very useful. They can be removed safely.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Feng Liu <[email protected]>

Closes apache#20562 from liufengdb/fix-omm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants