Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added compatibility for python 2.6 for ssh_read command #941

Closed
wants to merge 2 commits into from

Conversation

AtlasPilotPuppy
Copy link
Contributor

https://issues.apache.org/jira/browse/SPARK-1990

There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

What is the difference between Popen and check_output? Will the Popen hide error codes? We should make sure this doesn't break behavior, and ideally use the same code for both 2.6 and 2.7 if Popen always works.

Also, was this the only change needed for Python 2.6?

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Jenkins, test this please

@AtlasPilotPuppy
Copy link
Contributor Author

Popen is the old way to do it. You could capture the stderror if you want or just let it raise (default behavior). check_output is more pythonic since python 2.7. 2.6 and 2.7 could both use Popen.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15364/

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Let's make 2.7 also use Popen then. It seems unnecessary to have two code paths here if they both result in the error being printed.

@AtlasPilotPuppy
Copy link
Contributor Author

Agreed.
On Jun 3, 2014 12:08 PM, "Matei Zaharia" [email protected] wrote:

Let's make 2.7 also use Popen then. It seems unnecessary to have two code
paths here if they both result in the error being printed.


Reply to this email directly or view it on GitHub
#941 (comment).

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Also, can you update the title of the pull request to say SPARK-1990 and to match the description in the JIRA? This will simplify bookkeeping later. (I'm assuming this fixes the whole JIRA, otherwise let me know if there are other parts missing.)

@pwendell
Copy link
Contributor

pwendell commented Jun 3, 2014

Hey one of they things about check_call is that it would give throw an exception if the suprocess returned nonzero. Does this have the same behavior? This is actually is somewhat important because otherwise the script will happily continue having captured the wrong output and it will downstream throw an error that is really hard to debug.

I wonder if we should just inline the definition of check_output from Python 7. That's what's suggested here:

http://stackoverflow.com/questions/4814970/subprocess-check-output-doesnt-seem-to-exist-python-2-6-5

@AtlasPilotPuppy
Copy link
Contributor Author

That might actually be a smarter idea. We could just extend subprocess to
add check_all and use that.

On Tue, Jun 3, 2014 at 2:43 PM, Patrick Wendell [email protected]
wrote:

Hey one of they things about check_call is that it would give throw an
exception if the suprocess returned nonzero. Does this have the same
behavior? This is actually is somewhat important because otherwise the
script will happily continue having captured the wrong output and it will
downstream throw an error that is really hard to debug.

I wonder if we should just inline the definition of check_output from
Python 7. That's what's suggested here:

http://stackoverflow.com/questions/4814970/subprocess-check-output-doesnt-seem-to-exist-python-2-6-5


Reply to this email directly or view it on GitHub
#941 (comment).

@pwendell
Copy link
Contributor

pwendell commented Jun 3, 2014

yeah, but just to be clear, I was suggesting adding a function check_call in our script rather than necessarily extending the subprocess module. I guess maybe we could dyanmically add the method to subprocess as well (?)

@AtlasPilotPuppy
Copy link
Contributor Author

We could dynamically add the method to subprocess within the scope or
create a function in the scope both of them sound like acceptable solutions
On Jun 3, 2014 2:32 PM, "Patrick Wendell" [email protected] wrote:

yeah, but just to be clear, I was suggesting adding a function check_call
in our script rather than necessarily extending the subprocess module. I
guess maybe we could dyanmically add the method to subprocess as well (?)


Reply to this email directly or view it on GitHub
#941 (comment).

@pwendell
Copy link
Contributor

pwendell commented Jun 4, 2014

To keep it simple, I'd prefer to just create a new function in the scope.

@AtlasPilotPuppy
Copy link
Contributor Author

I just created _check_output within the module. Fixes the issue in python 2.6 and 2.7

@AtlasPilotPuppy
Copy link
Contributor Author

Jenkins, test this please

@pwendell
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15842/

@asfgit asfgit closed this in 8cd04c3 Jun 17, 2014
asfgit pushed a commit that referenced this pull request Jun 17, 2014
https://issues.apache.org/jira/browse/SPARK-1990

There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old

Author: Anant <[email protected]>

Closes #941 from anantasty/SPARK-1990 and squashes the following commits:

4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6
c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command

Conflicts:

	ec2/spark_ec2.py
@pwendell
Copy link
Contributor

Thanks - I've merged this and back ported it into 0.9 and 1.0

asfgit pushed a commit that referenced this pull request Jun 17, 2014
https://issues.apache.org/jira/browse/SPARK-1990

There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old

Author: Anant <[email protected]>

Closes #941 from anantasty/SPARK-1990 and squashes the following commits:

4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6
c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command

Conflicts:

	ec2/spark_ec2.py
pwendell added a commit to pwendell/spark that referenced this pull request Jun 17, 2014
@pwendell
Copy link
Contributor

@anantasty FYI this had a bug in it for which I've added a hot fix. I should have tested this before committing.

@AtlasPilotPuppy
Copy link
Contributor Author

@pwendell Ouch! my bad. Thanks for the fix.

asfgit pushed a commit that referenced this pull request Jun 17, 2014
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0.

Author: Patrick Wendell <[email protected]>

Closes #1108 from pwendell/hotfix and squashes the following commits:

711c58d [Patrick Wendell] HOTFIX: bug caused by #941
asfgit pushed a commit that referenced this pull request Jun 17, 2014
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0.

Author: Patrick Wendell <[email protected]>

Closes #1108 from pwendell/hotfix and squashes the following commits:

711c58d [Patrick Wendell] HOTFIX: bug caused by #941

(cherry picked from commit b2ebf42)
Signed-off-by: Xiangrui Meng <[email protected]>
asfgit pushed a commit that referenced this pull request Jun 17, 2014
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0.

Author: Patrick Wendell <[email protected]>

Closes #1108 from pwendell/hotfix and squashes the following commits:

711c58d [Patrick Wendell] HOTFIX: bug caused by #941

(cherry picked from commit b2ebf42)
Signed-off-by: Xiangrui Meng <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
https://issues.apache.org/jira/browse/SPARK-1990

There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old

Author: Anant <[email protected]>

Closes apache#941 from anantasty/SPARK-1990 and squashes the following commits:

4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6
c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0.

Author: Patrick Wendell <[email protected]>

Closes apache#1108 from pwendell/hotfix and squashes the following commits:

711c58d [Patrick Wendell] HOTFIX: bug caused by apache#941
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
https://issues.apache.org/jira/browse/SPARK-1990

There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old

Author: Anant <[email protected]>

Closes apache#941 from anantasty/SPARK-1990 and squashes the following commits:

4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6
c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0.

Author: Patrick Wendell <[email protected]>

Closes apache#1108 from pwendell/hotfix and squashes the following commits:

711c58d [Patrick Wendell] HOTFIX: bug caused by apache#941
chuckchen pushed a commit to chuckchen/spark that referenced this pull request Jun 25, 2015
Add "org.apache." prefix to packages in spark-class
(cherry picked from commit f06f2da)

Signed-off-by: Reynold Xin <[email protected]>
wangyum added a commit that referenced this pull request May 26, 2023
… expressions from aggregate expressions without aggregate function (#941)

* [SPARK-34581][SQL] Don't optimize out grouping expressions from aggregate expressions without aggregate function

### What changes were proposed in this pull request?
This PR adds a new rule `PullOutGroupingExpressions` to pull out complex grouping expressions to a `Project` node under an `Aggregate`. These expressions are then referenced in both grouping expressions and aggregate expressions without aggregate functions to ensure that optimization rules don't change the aggregate expressions to invalid ones that no longer refer to any grouping expressions.

### Why are the changes needed?
If aggregate expressions (without aggregate functions) in an `Aggregate` node are complex then the `Optimizer` can optimize out grouping expressions from them and so making aggregate expressions invalid.

Here is a simple example:
```
SELECT not(t.id IS NULL) , count(*)
FROM t
GROUP BY t.id IS NULL
```
In this case the `BooleanSimplification` rule does this:
```
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification ===
!Aggregate [isnull(id#222)], [NOT isnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L]   Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L]
 +- Project [value#219 AS id#222]                                                                 +- Project [value#219 AS id#222]
    +- LocalRelation [value#219]                                                                     +- LocalRelation [value#219]
```
where `NOT isnull(id#222)` is optimized to `isnotnull(id#222)` and so it no longer refers to any grouping expression.

Before this PR:
```
== Optimized Logical Plan ==
Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#234, count(1) AS c#232L]
+- Project [value#219 AS id#222]
   +- LocalRelation [value#219]
```
and running the query throws an error:
```
Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L]
java.lang.IllegalStateException: Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L]
```

After this PR:
```
== Optimized Logical Plan ==
Aggregate [_groupingexpression#233], [NOT _groupingexpression#233 AS (NOT (id IS NULL))#230, count(1) AS c#228L]
+- Project [isnull(value#219) AS _groupingexpression#233]
   +- LocalRelation [value#219]
```
and the query works.

### Does this PR introduce _any_ user-facing change?
Yes, the query works.

### How was this patch tested?
Added new UT.

Closes #32396 from peter-toth/SPARK-34581-keep-grouping-expressions-2.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit cfc0495)

* [SPARK-34037][SQL] Remove unnecessary upcasting for Avg & Sum which handle by themself internally

### What changes were proposed in this pull request?
The type-coercion for numeric types of average and sum is not necessary at all, as the resultType and sumType can prevent the overflow.

### Why are the changes needed?

rm unnecessary logic which may cause potential performance regressions

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

tpcds tests for plan

Closes #31079 from yaooqinn/SPARK-34037.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>

(cherry picked from commit a235c3b)

Co-authored-by: Peter Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants