[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

JoshRosen · 2015-06-18T01:17:47Z

This patch builds upon #5694 to add a 'module' abstraction to the dev/run-tests script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.

This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests. It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.

This patch also adds doctests for the new graph traversal / change mapping code.

JoshRosen · 2015-06-18T01:17:53Z

/cc @brennonyork

SparkQA · 2015-06-18T03:14:19Z

Test build #35071 has finished for PR 6866 at commit ee20c55.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):

SparkQA · 2015-06-18T03:40:45Z

Test build #35079 has finished for PR 6866 at commit c108d09.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):

…e#6865)

SparkQA · 2015-06-18T08:53:58Z

Test build #35100 has finished for PR 6866 at commit 7092d3e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):
- case class CreateStruct(children: Seq[Expression]) extends Expression
- case class Logarithm(left: Expression, right: Expression)
- case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableCommand with Logging

JoshRosen · 2015-06-18T17:26:23Z

/cc @tdas, do you think that I should define additional modules for the individual external streaming projects, such as the Kafka + Flume components? I'd be glad to do that here as part of this PR, since I'm going to have to push some additional documentation commits anyways.

TBH, I think the biggest wins are going to come from adding conditional test running logic to the Python tests, since those take 20-30 minutes to run, but I'm going to defer that to a separate PR / someone else, since we'll have to refactor python/run-tests first.

davies · 2015-06-18T21:34:07Z

dev/run-tests.py

+        "streaming-mqtt/test",
+        "streaming-twitter/test",
+        "streaming-zeromq/test",
+    ]


should run pyspark test, same to sql and mllib

The intent was to have this be covered via transitive dependencies: the pyspark module (on line 162) depends on mllib, sql, and streaming, so changes in any of those modules will cause the python tests to be run.

O, I see, make sense.

tdas · 2015-06-18T22:47:58Z

@JoshRosen Defining additional modules is a good idea, but its still optional. Bundling them in the streaming module is fine as the streaming module that runs only the streaming+external tests and nothing else is going to be a huge win.

Though just in case you want to separate the external stuff in their own module, you have to also consider doing that for python. Gotta be careful, if not done right, one can easily miss running python Kafka tests if Scala Kafka code was updated.

JoshRosen · 2015-06-18T22:50:25Z

dev/run-tests.py

+
+mllib = Module(
+    name="mllib",
+    dependencies=[sql],


I guess mllib also depends on streaming, but let me double-check.

Yep, adding this.

SparkQA · 2015-06-19T06:38:31Z

Test build #35236 has finished for PR 6866 at commit df10e23.

This patch fails some tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):

SparkQA · 2015-06-19T09:15:59Z

Test build #35239 has finished for PR 6866 at commit e46539f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):

JoshRosen · 2015-06-19T18:53:05Z

It just occurred to me that ec2 might be a good module to add as well, since it technically doesn't depend on any other modules and only needs to run the Python style checks.

andrewor14 · 2015-06-19T18:59:14Z

dev/run-tests.py

+    dependencies=[],
+    source_file_regexes=[
+        "external/",
+        "extras/java8-tests/",


shouldn't be there

andrewor14 · 2015-06-19T19:01:47Z

dev/run-tests.py

+    ],
+    sbt_test_goals=[
+        "mllib/test",
+        "examples/test",


is this necessary? we don't do this for other modules

not necessary; removing it now.

andrewor14 · 2015-06-19T19:06:01Z

looks great!

JoshRosen · 2015-06-19T22:43:46Z

The one part of the test code that I still find somewhat confusing is the logic for choosing which SBT profiles to enable in order to run the tests. If we want to run the hive tests, for instance, we need to run the tests with the -Phive profile; something similar holds for the kinesis-asl library.

I'm going to look into pulling this logic into the Module class so that it's easier to understand and extend.

SparkQA · 2015-06-20T01:06:36Z

Test build #35334 has finished for PR 6866 at commit a86a953.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):

SparkQA · 2015-06-20T07:52:52Z

Test build #35355 has finished for PR 6866 at commit 75de450.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Module(object):

JoshRosen · 2015-06-20T08:09:29Z

Yay, this still passes tests! I've updated this to move the test build profile selection logic into the module system, which I think makes things a bit easier to understand. I've also done a bit of work to add modules for some of the external streaming subprojects.

I have lots of other planned enhancements for this script, but I'm going to defer them to future work. In the meantime, if folks think that this looks okay then I'd like to merge it into master ASAP.

davies · 2015-06-20T22:59:49Z

dev/run-tests.py

+
+pyspark = Module(
+    name="pyspark",
+    dependencies=[mllib, streaming, sql],


There is python tests for streaming_kafka

davies · 2015-06-20T23:04:10Z

LGTM, only one small thing

davies · 2015-06-20T23:07:26Z

@JoshRosen I fixed it and merged into master!

This was introduced in #6866.

This was introduced in apache#6866.

JoshRosen mentioned this pull request Jun 18, 2015

[SPARK-7017][HOTFIX][Project Infra]: Refactor dev/run-tests into Python #6865

Closed

JoshRosen added 6 commits June 17, 2015 23:56

WIP

f0249bd

Finish integrating module changes

f53864b

Remove doc profiles option, since it's not actually needed (see apach…

37f3fb3

…e#6865)

Test everything if nothing has changed (needed for non-PRB builds)

3371441

Minor fixes

43a0ced

Skip SBT tests if no test goals are specified

7092d3e

JoshRosen force-pushed the more-dev-run-tests-refactoring branch from c108d09 to 7092d3e Compare June 18, 2015 06:57

JoshRosen mentioned this pull request Jun 18, 2015

[SPARK-5482][PySpark] Allow individual test suites in python/run-tests #4269

Closed

2 tasks

davies reviewed Jun 18, 2015
View reviewed changes

JoshRosen reviewed Jun 18, 2015
View reviewed changes

JoshRosen added 3 commits June 18, 2015 23:28

Use changed files' extensions to decide whether to run style checks

dc6f1c6

mllib should depend on streaming

3670d50

update to reflect fact that no module depends on root

df10e23

JoshRosen added 2 commits June 18, 2015 23:47

Enable Hive tests when running all tests

35a3052

Fix camel-cased endswith()

e46539f

andrewor14 reviewed Jun 19, 2015
View reviewed changes

dev/run-tests.py

dependencies=[],

source_file_regexes=[

"external/",

"extras/java8-tests/",

Copy link

Contributor

andrewor14 Jun 19, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be there

andrewor14 reviewed Jun 19, 2015
View reviewed changes

Clean up modules; add new modules for streaming external projects

a86a953

JoshRosen added 2 commits June 19, 2015 21:58

Add documentation to Module.

4224da5

Use module system to determine which build profiles to enable.

75de450

davies reviewed Jun 20, 2015
View reviewed changes

asfgit closed this in 7a3c424 Jun 20, 2015

asfgit pushed a commit that referenced this pull request Jun 22, 2015

[HOTFIX] [TESTS] Typo mqqt -> mqtt

1dfb0f7

This was introduced in #6866.

animeshbaranawal pushed a commit to animeshbaranawal/spark that referenced this pull request Jun 25, 2015

[HOTFIX] [TESTS] Typo mqqt -> mqtt

84af092

This was introduced in apache#6866.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

JoshRosen commented Jun 18, 2015

JoshRosen commented Jun 18, 2015

SparkQA commented Jun 18, 2015

SparkQA commented Jun 18, 2015

SparkQA commented Jun 18, 2015

JoshRosen commented Jun 18, 2015

davies Jun 18, 2015

JoshRosen Jun 18, 2015

davies Jun 18, 2015

tdas commented Jun 18, 2015

JoshRosen Jun 18, 2015

JoshRosen Jun 19, 2015

SparkQA commented Jun 19, 2015

SparkQA commented Jun 19, 2015

JoshRosen commented Jun 19, 2015

andrewor14 Jun 19, 2015

andrewor14 Jun 19, 2015

JoshRosen Jun 19, 2015

andrewor14 commented Jun 19, 2015

JoshRosen commented Jun 19, 2015

SparkQA commented Jun 20, 2015

SparkQA commented Jun 20, 2015

JoshRosen commented Jun 20, 2015

davies Jun 20, 2015

davies commented Jun 20, 2015

davies commented Jun 20, 2015

[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

Conversation

JoshRosen commented Jun 18, 2015

JoshRosen commented Jun 18, 2015

SparkQA commented Jun 18, 2015

SparkQA commented Jun 18, 2015

SparkQA commented Jun 18, 2015

JoshRosen commented Jun 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdas commented Jun 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 19, 2015

SparkQA commented Jun 19, 2015

JoshRosen commented Jun 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Jun 19, 2015

JoshRosen commented Jun 19, 2015

SparkQA commented Jun 20, 2015

SparkQA commented Jun 20, 2015

JoshRosen commented Jun 20, 2015

Choose a reason for hiding this comment

davies commented Jun 20, 2015

davies commented Jun 20, 2015