Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

Closed

Conversation

JoshRosen
Copy link
Contributor

This patch builds upon #5694 to add a 'module' abstraction to the dev/run-tests script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.

This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests. It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.

This patch also adds doctests for the new graph traversal / change mapping code.

@JoshRosen
Copy link
Contributor Author

/cc @brennonyork

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35071 has finished for PR 6866 at commit ee20c55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35079 has finished for PR 6866 at commit c108d09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen JoshRosen force-pushed the more-dev-run-tests-refactoring branch from c108d09 to 7092d3e Compare June 18, 2015 06:57
@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35100 has finished for PR 6866 at commit 7092d3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):
    • case class CreateStruct(children: Seq[Expression]) extends Expression
    • case class Logarithm(left: Expression, right: Expression)
    • case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableCommand with Logging

@JoshRosen
Copy link
Contributor Author

/cc @tdas, do you think that I should define additional modules for the individual external streaming projects, such as the Kafka + Flume components? I'd be glad to do that here as part of this PR, since I'm going to have to push some additional documentation commits anyways.

TBH, I think the biggest wins are going to come from adding conditional test running logic to the Python tests, since those take 20-30 minutes to run, but I'm going to defer that to a separate PR / someone else, since we'll have to refactor python/run-tests first.

"streaming-mqtt/test",
"streaming-twitter/test",
"streaming-zeromq/test",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should run pyspark test, same to sql and mllib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was to have this be covered via transitive dependencies: the pyspark module (on line 162) depends on mllib, sql, and streaming, so changes in any of those modules will cause the python tests to be run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O, I see, make sense.

@tdas
Copy link
Contributor

tdas commented Jun 18, 2015

@JoshRosen Defining additional modules is a good idea, but its still optional. Bundling them in the streaming module is fine as the streaming module that runs only the streaming+external tests and nothing else is going to be a huge win.

Though just in case you want to separate the external stuff in their own module, you have to also consider doing that for python. Gotta be careful, if not done right, one can easily miss running python Kafka tests if Scala Kafka code was updated.


mllib = Module(
name="mllib",
dependencies=[sql],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess mllib also depends on streaming, but let me double-check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, adding this.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35236 has finished for PR 6866 at commit df10e23.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35239 has finished for PR 6866 at commit e46539f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

It just occurred to me that ec2 might be a good module to add as well, since it technically doesn't depend on any other modules and only needs to run the Python style checks.

dependencies=[],
source_file_regexes=[
"external/",
"extras/java8-tests/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be there

],
sbt_test_goals=[
"mllib/test",
"examples/test",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? we don't do this for other modules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary; removing it now.

@andrewor14
Copy link
Contributor

looks great!

@JoshRosen
Copy link
Contributor Author

The one part of the test code that I still find somewhat confusing is the logic for choosing which SBT profiles to enable in order to run the tests. If we want to run the hive tests, for instance, we need to run the tests with the -Phive profile; something similar holds for the kinesis-asl library.

I'm going to look into pulling this logic into the Module class so that it's easier to understand and extend.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35334 has finished for PR 6866 at commit a86a953.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35355 has finished for PR 6866 at commit 75de450.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

Yay, this still passes tests! I've updated this to move the test build profile selection logic into the module system, which I think makes things a bit easier to understand. I've also done a bit of work to add modules for some of the external streaming subprojects.

I have lots of other planned enhancements for this script, but I'm going to defer them to future work. In the meantime, if folks think that this looks okay then I'd like to merge it into master ASAP.


pyspark = Module(
name="pyspark",
dependencies=[mllib, streaming, sql],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is python tests for streaming_kafka

@davies
Copy link
Contributor

davies commented Jun 20, 2015

LGTM, only one small thing

@asfgit asfgit closed this in 7a3c424 Jun 20, 2015
@davies
Copy link
Contributor

davies commented Jun 20, 2015

@JoshRosen I fixed it and merged into master!

asfgit pushed a commit that referenced this pull request Jun 22, 2015
This was introduced in #6866.
animeshbaranawal pushed a commit to animeshbaranawal/spark that referenced this pull request Jun 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants