-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python #5694
Conversation
Test build #30950 has started for PR 5694 at commit |
Test build #30950 has finished for PR 5694 at commit
|
import subprocess as sp | ||
|
||
# Set the Spark project root directory | ||
spark_proj_root = os.path.abspath("..") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable names and logic here are very clear, so I think we can do without all the comments in this whole starting block.
Looks like a great start! Left some comments mostly about Python style and organization. Will take a closer look next week at the actual logic and flow. |
sp.check_output(cmd) | ||
except sp.CalledProcessError as e: | ||
print "[error] running", e.cmd, "; received return code", e.returncode | ||
exit(e.returncode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to import sys
and call this as sys.exit()
?
Just a naming suggestion - would be better to have an explicit .py file, and the shell file just calls that. |
…sh to call the run-tests.py script
|
||
def set_title_and_block(title, err_block): | ||
os.environ["CURRENT_BLOCK"] = error_codes[err_block] | ||
line_str = "".join(['='] * 72) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be:
line_str = '=' * 72
In the near future, I guess we want to move towards also converting the |
"""Will call Maven in the current directory with the list of mvn_args passed | ||
in and returns the subprocess for any further processing""" | ||
|
||
return subprocess.Popen(["./build/mvn"] + mvn_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not just wait()
here? It seems like that's what we're always doing with the return value.
Also, if we want to raise an exception when this fails (which I think we do), we should change Popen()
to check_call()
, in which case no need for the wait()
either.
…output to check_call
@rxin Thanks and taken care of! |
Yup, that sounds fine. We definitely don't have to tackle everything in one go, but eventually yeah we should phase out all the Bash-isms. By the way, I don't see any recent updates from Jenkins on testing this PR since Jenkins is currently shutting down. Just a heads up: Jenkins adds an additional layer of complexity in that it messes with the terminal and can cause weird problems with output buffering and whatnot. So you'll definitely want to track how Jenkins is outputting results to screen as it runs tests to be sure things are working as expected. |
Roger that. Let me look into using |
https://amplab.cs.berkeley.edu/jenkins/view/Spark/ Looks like planned maintenance? |
Roger. I see it now. Will have a fix up shortly. |
Merged build triggered. |
Merged build started. |
Test build #35009 has started for PR 5694 at commit |
Test build #35008 has finished for PR 5694 at commit
|
Merged build finished. Test FAILed. |
jenkins, retest this please |
Merged build triggered. |
Merged build started. |
Test build #35010 has started for PR 5694 at commit |
Test build #35009 has finished for PR 5694 at commit
|
Merged build finished. Test PASSed. |
Test build #35010 has finished for PR 5694 at commit
|
Merged build finished. Test PASSed. |
doc_files]]) | ||
changed_core_files = set(changed_files).difference(top_level_project_files) | ||
|
||
if changed_core_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice to log a "Detected changes in Core" message in this branch.
Some minor nits aside, this looks good enough to me, so I'm going to merge it into master and will submit a series of followup PRs to perform more modularization / refactoring of the dependency graph logic. Thanks for working on this! |
Thanks for the PR merge @JoshRosen. I'll go ahead and make a hotfix branch to capture the last few nits you have above! |
|
||
def run_scala_tests_sbt(test_modules, test_profiles): | ||
# declare the variable for reference | ||
sbt_test_goals = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll end up concatenating a list to this if we only run the MLLib tests in the code below. This breaks PRB tests for MLLib, Streaming, and GraphX only PRs.
I'm considering reverting this patch or pushing a hotfix to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hotfixed in 165f52f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Josh, I'm out of pocket until tomorrow morning. Can you get a hot fix in for this? Immediately I'm not seeing what the issue is / code to fix it :/ I can figure it out tomorrow morning though if it can wait.
-----Original Message-----
From: Josh Rosen [[email protected]:[email protected]]
Sent: Wednesday, June 17, 2015 09:58 PM Eastern Standard Time
To: apache/spark
Cc: York, Brennon
Subject: Re: [spark] [SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python (#5694)
In dev/run-tests.pyhttps://github.com//pull/5694#discussion_r32693784:
return changed_modules
+def run_scala_tests_maven(test_profiles):
- mvn_test_goals = ["test", "--fail-at-end"]
- profiles_and_goals = test_profiles + mvn_test_goals
- print "[info] Running Spark tests using Maven with these arguments:",
- print " ".join(profiles_and_goals)
- exec_maven(profiles_and_goals)
+def run_scala_tests_sbt(test_modules, test_profiles):
declare the variable for reference
- sbt_test_goals = None
We'll end up concatenating a list to this if we only run the MLLib tests in the code below. This breaks PRB tests for MLLib, Streaming, and GraphX only PRs.
I'm considering reverting this patch or pushing a hotfix to address this.
—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5694/files#r32693784.
The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here was that some of the branches in the following code do sbt_test_goals += [..something..]
which will fail if sbt_test_goals == None
.
I pushed a hotfix which fixed this.
All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step! /cc srowen pwendell nchammas Author: Brennon York <[email protected]> Closes apache#5694 from brennonyork/SPARK-7017 and squashes the following commits: 154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set 3922a85 [Brennon York] removed necessary passed in variable f9fbe54 [Brennon York] reverted doc test change 8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated 05d435b [Brennon York] added check for jekyll install 22edb78 [Brennon York] add check if jekyll isn't installed on the path 2dff136 [Brennon York] fixed pep8 whitespace errors 767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes c42cf9a [Brennon York] unpack set operations with splat (*) fb85a41 [Brennon York] fixed minor set bug 0379833 [Brennon York] minor doc addition to print the changed modules aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity ec1ae78 [Brennon York] minor name changes, bug fixes b7c72b9 [Brennon York] reverting streaming context 03fdd7b [Brennon York] fixed the tuple () wraps around example lambda 705d12e [Brennon York] changed example to comply with pep3113 supporting python3 60b3d51 [Brennon York] prepend rather than append onto PATH 7d2f5e2 [Brennon York] updated python tests to remove unused variable 2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests eb684b6 [Brennon York] fixed sbt_test_goals reference error db7ae6f [Brennon York] reverted SPARK_HOME from start of command 1ecca26 [Brennon York] fixed merge conflicts 2fcdfc0 [Brennon York] testing targte branch dump on jenkins 1f607b1 [Brennon York] finalizing revisions to modular tests 8afbe93 [Brennon York] made error codes a global 0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run b1248dc [Brennon York] exec python rather than running python and exiting with return code f9deba1 [Brennon York] python to python2 and removed newline 6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249 f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908 703f095 [Brennon York] fixed merge conflicts b1ca593 [Brennon York] reverted the sparkR test afeb093 [Brennon York] updated to make sparkR test fail 1dada6b [Brennon York] reverted pyspark test failure 9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure d825aa4 [Brennon York] revert build break, add mima break f041d8a [Brennon York] added space from commented import to now test build breaking 983f2a2 [Brennon York] comment out import to fail build test 2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017 76335fb [Brennon York] reverted rat license issue for sparkconf e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes 56d3cb9 [Brennon York] changed test back and commented out import to break compile b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment 7613558 [Brennon York] updated to return the proper env variable for return codes a5bd445 [Brennon York] reverted license, changed test in shuffle to fail 803143a [Brennon York] removed license file for SparkContext b0b2604 [Brennon York] comment out import to see if build fails and returns properly 83e80ef [Brennon York] attempt at better python output when called from bash c095fa6 [Brennon York] removed another wait() call 26e18e8 [Brennon York] removed unnecessary wait() 07210a9 [Brennon York] minor doc string change for java version with namedtuple update ec03bf3 [Brennon York] added namedtuple for java version to add readability 2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call 639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script 3c53a1a [Brennon York] uncomment the scala tests :) 6126c4f [Brennon York] refactored run-tests into python
|
||
test_modules = set(test_modules) | ||
|
||
hive_profiles = ("SQL" in test_modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this logic isn't quite right for the non-pull-request-builder SBT builds, since we always want to run the SQL tests in non-PRB builds but in those cases "ALL" will be the only test module. I'm going to fix this in my PR, but I think that it's critical that we get a fix for this in soon.
…run-tests This patch builds upon apache#5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules. This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests. It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed. This patch also adds doctests for the new graph traversal / change mapping code. Author: Josh Rosen <[email protected]> Closes apache#6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits: 75de450 [Josh Rosen] Use module system to determine which build profiles to enable. 4224da5 [Josh Rosen] Add documentation to Module. a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects e46539f [Josh Rosen] Fix camel-cased endswith() 35a3052 [Josh Rosen] Enable Hive tests when running all tests df10e23 [Josh Rosen] update to reflect fact that no module depends on root 3670d50 [Josh Rosen] mllib should depend on streaming dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks 7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified 43a0ced [Josh Rosen] Minor fixes 3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds) 37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see apache#6865) f53864b [Josh Rosen] Finish integrating module changes f0249bd [Josh Rosen] WIP
…adoop-1 to 1.2.1 PR #5694 reverted PR #6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR #6384 didn't bump Hadoop 1 version defined in POM. Author: Cheng Lian <[email protected]> Closes #7062 from liancheng/spark-7845 and squashes the following commits: c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
All, this is a first attempt at refactoring
dev/run-tests
into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!/cc @srowen @pwendell @nchammas