-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added compatibility for python 2.6 for ssh_read command #941
Conversation
Can one of the admins verify this patch? |
What is the difference between Popen and check_output? Will the Popen hide error codes? We should make sure this doesn't break behavior, and ideally use the same code for both 2.6 and 2.7 if Popen always works. Also, was this the only change needed for Python 2.6? |
Jenkins, test this please |
Popen is the old way to do it. You could capture the stderror if you want or just let it raise (default behavior). check_output is more pythonic since python 2.7. 2.6 and 2.7 could both use Popen. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Let's make 2.7 also use Popen then. It seems unnecessary to have two code paths here if they both result in the error being printed. |
Agreed.
|
Also, can you update the title of the pull request to say SPARK-1990 and to match the description in the JIRA? This will simplify bookkeeping later. (I'm assuming this fixes the whole JIRA, otherwise let me know if there are other parts missing.) |
Hey one of they things about I wonder if we should just inline the definition of check_output from Python 7. That's what's suggested here: http://stackoverflow.com/questions/4814970/subprocess-check-output-doesnt-seem-to-exist-python-2-6-5 |
That might actually be a smarter idea. We could just extend subprocess to On Tue, Jun 3, 2014 at 2:43 PM, Patrick Wendell [email protected]
|
yeah, but just to be clear, I was suggesting adding a function check_call in our script rather than necessarily extending the subprocess module. I guess maybe we could dyanmically add the method to subprocess as well (?) |
We could dynamically add the method to subprocess within the scope or
|
To keep it simple, I'd prefer to just create a new function in the scope. |
I just created _check_output within the module. Fixes the issue in python 2.6 and 2.7 |
Jenkins, test this please |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes #941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command Conflicts: ec2/spark_ec2.py
Thanks - I've merged this and back ported it into 0.9 and 1.0 |
https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes #941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command Conflicts: ec2/spark_ec2.py
@anantasty FYI this had a bug in it for which I've added a hot fix. I should have tested this before committing. |
@pwendell Ouch! my bad. Thanks for the fix. |
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes #1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by #941
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes #1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by #941 (cherry picked from commit b2ebf42) Signed-off-by: Xiangrui Meng <[email protected]>
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes #1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by #941 (cherry picked from commit b2ebf42) Signed-off-by: Xiangrui Meng <[email protected]>
https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes apache#941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes apache#1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by apache#941
https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <[email protected]> Closes apache#941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <[email protected]> Closes apache#1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by apache#941
Add "org.apache." prefix to packages in spark-class (cherry picked from commit f06f2da) Signed-off-by: Reynold Xin <[email protected]>
… expressions from aggregate expressions without aggregate function (#941) * [SPARK-34581][SQL] Don't optimize out grouping expressions from aggregate expressions without aggregate function ### What changes were proposed in this pull request? This PR adds a new rule `PullOutGroupingExpressions` to pull out complex grouping expressions to a `Project` node under an `Aggregate`. These expressions are then referenced in both grouping expressions and aggregate expressions without aggregate functions to ensure that optimization rules don't change the aggregate expressions to invalid ones that no longer refer to any grouping expressions. ### Why are the changes needed? If aggregate expressions (without aggregate functions) in an `Aggregate` node are complex then the `Optimizer` can optimize out grouping expressions from them and so making aggregate expressions invalid. Here is a simple example: ``` SELECT not(t.id IS NULL) , count(*) FROM t GROUP BY t.id IS NULL ``` In this case the `BooleanSimplification` rule does this: ``` === Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification === !Aggregate [isnull(id#222)], [NOT isnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L] Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L] +- Project [value#219 AS id#222] +- Project [value#219 AS id#222] +- LocalRelation [value#219] +- LocalRelation [value#219] ``` where `NOT isnull(id#222)` is optimized to `isnotnull(id#222)` and so it no longer refers to any grouping expression. Before this PR: ``` == Optimized Logical Plan == Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#234, count(1) AS c#232L] +- Project [value#219 AS id#222] +- LocalRelation [value#219] ``` and running the query throws an error: ``` Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L] java.lang.IllegalStateException: Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L] ``` After this PR: ``` == Optimized Logical Plan == Aggregate [_groupingexpression#233], [NOT _groupingexpression#233 AS (NOT (id IS NULL))#230, count(1) AS c#228L] +- Project [isnull(value#219) AS _groupingexpression#233] +- LocalRelation [value#219] ``` and the query works. ### Does this PR introduce _any_ user-facing change? Yes, the query works. ### How was this patch tested? Added new UT. Closes #32396 from peter-toth/SPARK-34581-keep-grouping-expressions-2. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit cfc0495) * [SPARK-34037][SQL] Remove unnecessary upcasting for Avg & Sum which handle by themself internally ### What changes were proposed in this pull request? The type-coercion for numeric types of average and sum is not necessary at all, as the resultType and sumType can prevent the overflow. ### Why are the changes needed? rm unnecessary logic which may cause potential performance regressions ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tpcds tests for plan Closes #31079 from yaooqinn/SPARK-34037. Authored-by: Kent Yao <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit a235c3b) Co-authored-by: Peter Toth <[email protected]>
https://issues.apache.org/jira/browse/SPARK-1990
There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old