Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23312][SQL][followup] add a config to turn off vectorized cache reader #20513

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2.

If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it.

How was this patch tested?

existing tests.

@cloud-fan
Copy link
Contributor Author

@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
}) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
}

// TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows?
override def supportCodegen: Boolean = supportsBatch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2.4 we should look into this. My gut feeling is we don't need to enable whole stage codegen for scan nodes that output data as rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do more perf measurement after 2.3 release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is safe to keep the same behavior of 2.2.

I'm not sure if enabling whole stage codegen can hurt performance for scan nodes, btw. We can revisit this, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We confirmed that we got performance improvements in several cases.
I will revisit this with more cases after 2.3 release.

@gatorsmile
Copy link
Member

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87088 has finished for PR 20513 at commit 8525b2c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87093 has finished for PR 20513 at commit 8525b2c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Feb 6, 2018

retest this please.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kiszk
Copy link
Member

kiszk commented Feb 6, 2018

LGTM

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87097 has finished for PR 20513 at commit 8525b2c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87108 has finished for PR 20513 at commit 880412f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87113 has finished for PR 20513 at commit 880412f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

java.lang.RuntimeException: [unresolved dependency: com.sun.jersey#jersey-json;1.14: configuration not found in com.sun.jersey#jersey-json;1.14: 'master(compile)'. Missing configuration: 'compile'. It was required from org.apache.hadoop#hadoop-yarn-common;2.6.5 compile]

Seems like some environmental problems, let's try again.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87118 has finished for PR 20513 at commit 880412f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master/2.3

asfgit pushed a commit that referenced this pull request Feb 6, 2018
…e reader

## What changes were proposed in this pull request?

#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2.

If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <[email protected]>

Closes #20513 from cloud-fan/cache.

(cherry picked from commit ac7454c)
Signed-off-by: gatorsmile <[email protected]>
@asfgit asfgit closed this in ac7454c Feb 6, 2018
robert3005 pushed a commit to palantir/spark that referenced this pull request Feb 12, 2018
…e reader

## What changes were proposed in this pull request?

apache#20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2.

If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <[email protected]>

Closes apache#20513 from cloud-fan/cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants