added compatibility for python 2.6 for ssh_read command #941

AtlasPilotPuppy · 2014-06-02T17:57:51Z

https://issues.apache.org/jira/browse/SPARK-1990

There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old

AmplabJenkins · 2014-06-02T17:57:58Z

Can one of the admins verify this patch?

mateiz · 2014-06-03T01:23:27Z

What is the difference between Popen and check_output? Will the Popen hide error codes? We should make sure this doesn't break behavior, and ideally use the same code for both 2.6 and 2.7 if Popen always works.

Also, was this the only change needed for Python 2.6?

mateiz · 2014-06-03T01:23:35Z

Jenkins, test this please

AtlasPilotPuppy · 2014-06-03T01:26:12Z

Popen is the old way to do it. You could capture the stderror if you want or just let it raise (default behavior). check_output is more pythonic since python 2.7. 2.6 and 2.7 could both use Popen.

AmplabJenkins · 2014-06-03T01:27:59Z

Merged build triggered.

AmplabJenkins · 2014-06-03T01:28:07Z

Merged build started.

AmplabJenkins · 2014-06-03T02:04:16Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-03T02:04:17Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15364/

mateiz · 2014-06-03T19:08:15Z

Let's make 2.7 also use Popen then. It seems unnecessary to have two code paths here if they both result in the error being printed.

AtlasPilotPuppy · 2014-06-03T19:19:46Z

Agreed.
On Jun 3, 2014 12:08 PM, "Matei Zaharia" [email protected] wrote:

Let's make 2.7 also use Popen then. It seems unnecessary to have two code
paths here if they both result in the error being printed.

—
Reply to this email directly or view it on GitHub
#941 (comment).

mateiz · 2014-06-03T20:05:18Z

Also, can you update the title of the pull request to say SPARK-1990 and to match the description in the JIRA? This will simplify bookkeeping later. (I'm assuming this fixes the whole JIRA, otherwise let me know if there are other parts missing.)

pwendell · 2014-06-03T20:42:59Z

Hey one of they things about check_call is that it would give throw an exception if the suprocess returned nonzero. Does this have the same behavior? This is actually is somewhat important because otherwise the script will happily continue having captured the wrong output and it will downstream throw an error that is really hard to debug.

I wonder if we should just inline the definition of check_output from Python 7. That's what's suggested here:

http://stackoverflow.com/questions/4814970/subprocess-check-output-doesnt-seem-to-exist-python-2-6-5

AtlasPilotPuppy · 2014-06-03T20:53:50Z

That might actually be a smarter idea. We could just extend subprocess to
add check_all and use that.

On Tue, Jun 3, 2014 at 2:43 PM, Patrick Wendell [email protected]
wrote:

Hey one of they things about check_call is that it would give throw an
exception if the suprocess returned nonzero. Does this have the same
behavior? This is actually is somewhat important because otherwise the
script will happily continue having captured the wrong output and it will
downstream throw an error that is really hard to debug.

I wonder if we should just inline the definition of check_output from
Python 7. That's what's suggested here:

http://stackoverflow.com/questions/4814970/subprocess-check-output-doesnt-seem-to-exist-python-2-6-5

—
Reply to this email directly or view it on GitHub
#941 (comment).

pwendell · 2014-06-03T21:32:30Z

yeah, but just to be clear, I was suggesting adding a function check_call in our script rather than necessarily extending the subprocess module. I guess maybe we could dyanmically add the method to subprocess as well (?)

AtlasPilotPuppy · 2014-06-03T21:36:56Z

We could dynamically add the method to subprocess within the scope or
create a function in the scope both of them sound like acceptable solutions
On Jun 3, 2014 2:32 PM, "Patrick Wendell" [email protected] wrote:

yeah, but just to be clear, I was suggesting adding a function check_call
in our script rather than necessarily extending the subprocess module. I
guess maybe we could dyanmically add the method to subprocess as well (?)

—
Reply to this email directly or view it on GitHub
#941 (comment).

pwendell · 2014-06-04T05:13:09Z

To keep it simple, I'd prefer to just create a new function in the scope.

AtlasPilotPuppy · 2014-06-10T22:39:35Z

I just created _check_output within the module. Fixes the issue in python 2.6 and 2.7

AtlasPilotPuppy · 2014-06-13T20:37:15Z

Jenkins, test this please

pwendell · 2014-06-17T04:55:20Z

Jenkins, test this please.

AmplabJenkins · 2014-06-17T04:59:48Z

Merged build triggered.

AmplabJenkins · 2014-06-17T04:59:54Z

Merged build started.

AmplabJenkins · 2014-06-17T05:50:50Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-17T05:50:50Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15842/

https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes #941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command Conflicts: ec2/spark_ec2.py

pwendell · 2014-06-17T06:46:43Z

Thanks - I've merged this and back ported it into 0.9 and 1.0

https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes #941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command Conflicts: ec2/spark_ec2.py

pwendell · 2014-06-17T20:31:13Z

@anantasty FYI this had a bug in it for which I've added a hot fix. I should have tested this before committing.

AtlasPilotPuppy · 2014-06-17T20:33:47Z

@pwendell Ouch! my bad. Thanks for the fix.

This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes #1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by #941

This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes #1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by #941 (cherry picked from commit b2ebf42) Signed-off-by: Xiangrui Meng <[email protected]>

https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes apache#941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command

This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes apache#1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by apache#941

https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes apache#941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command

This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes apache#1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by apache#941

Add "org.apache." prefix to packages in spark-class (cherry picked from commit f06f2da) Signed-off-by: Reynold Xin <[email protected]>

… expressions from aggregate expressions without aggregate function (#941) * [SPARK-34581][SQL] Don't optimize out grouping expressions from aggregate expressions without aggregate function ### What changes were proposed in this pull request? This PR adds a new rule `PullOutGroupingExpressions` to pull out complex grouping expressions to a `Project` node under an `Aggregate`. These expressions are then referenced in both grouping expressions and aggregate expressions without aggregate functions to ensure that optimization rules don't change the aggregate expressions to invalid ones that no longer refer to any grouping expressions. ### Why are the changes needed? If aggregate expressions (without aggregate functions) in an `Aggregate` node are complex then the `Optimizer` can optimize out grouping expressions from them and so making aggregate expressions invalid. Here is a simple example: ``` SELECT not(t.id IS NULL) , count(*) FROM t GROUP BY t.id IS NULL ``` In this case the `BooleanSimplification` rule does this: ``` === Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification === !Aggregate [isnull(id#222)], [NOT isnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L] Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L] +- Project [value#219 AS id#222] +- Project [value#219 AS id#222] +- LocalRelation [value#219] +- LocalRelation [value#219] ``` where `NOT isnull(id#222)` is optimized to `isnotnull(id#222)` and so it no longer refers to any grouping expression. Before this PR: ``` == Optimized Logical Plan == Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#234, count(1) AS c#232L] +- Project [value#219 AS id#222] +- LocalRelation [value#219] ``` and running the query throws an error: ``` Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L] java.lang.IllegalStateException: Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L] ``` After this PR: ``` == Optimized Logical Plan == Aggregate [_groupingexpression#233], [NOT _groupingexpression#233 AS (NOT (id IS NULL))#230, count(1) AS c#228L] +- Project [isnull(value#219) AS _groupingexpression#233] +- LocalRelation [value#219] ``` and the query works. ### Does this PR introduce _any_ user-facing change? Yes, the query works. ### How was this patch tested? Added new UT. Closes #32396 from peter-toth/SPARK-34581-keep-grouping-expressions-2. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit cfc0495) * [SPARK-34037][SQL] Remove unnecessary upcasting for Avg & Sum which handle by themself internally ### What changes were proposed in this pull request? The type-coercion for numeric types of average and sum is not necessary at all, as the resultType and sumType can prevent the overflow. ### Why are the changes needed? rm unnecessary logic which may cause potential performance regressions ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tpcds tests for plan Closes #31079 from yaooqinn/SPARK-34037. Authored-by: Kent Yao <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit a235c3b) Co-authored-by: Peter Toth <[email protected]>

added compatibility for python 2.6 for ssh_read command

c6ed85c

Implmented check_optput withinthe module to work with python 2.6

4ca441d

asfgit closed this in 8cd04c3 Jun 17, 2014

pwendell added a commit to pwendell/spark that referenced this pull request Jun 17, 2014

HOTFIX: bug caused by apache#941

711c58d

chuckchen pushed a commit to chuckchen/spark that referenced this pull request Jun 25, 2015

Merge pull request apache#941 from ilikerps/master

f3c60c9

Add "org.apache." prefix to packages in spark-class (cherry picked from commit f06f2da) Signed-off-by: Reynold Xin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added compatibility for python 2.6 for ssh_read command #941

added compatibility for python 2.6 for ssh_read command #941

AtlasPilotPuppy commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

mateiz commented Jun 3, 2014

mateiz commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

mateiz commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

mateiz commented Jun 3, 2014

pwendell commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

pwendell commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

pwendell commented Jun 4, 2014

AtlasPilotPuppy commented Jun 10, 2014

AtlasPilotPuppy commented Jun 13, 2014

pwendell commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

pwendell commented Jun 17, 2014

pwendell commented Jun 17, 2014

AtlasPilotPuppy commented Jun 17, 2014

added compatibility for python 2.6 for ssh_read command #941

added compatibility for python 2.6 for ssh_read command #941

Conversation

AtlasPilotPuppy commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

mateiz commented Jun 3, 2014

mateiz commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

AmplabJenkins commented Jun 3, 2014

mateiz commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

mateiz commented Jun 3, 2014

pwendell commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

pwendell commented Jun 3, 2014

AtlasPilotPuppy commented Jun 3, 2014

pwendell commented Jun 4, 2014

AtlasPilotPuppy commented Jun 10, 2014

AtlasPilotPuppy commented Jun 13, 2014

pwendell commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

AmplabJenkins commented Jun 17, 2014

pwendell commented Jun 17, 2014

pwendell commented Jun 17, 2014

AtlasPilotPuppy commented Jun 17, 2014