Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7161][History Server] Provide REST api to download event logs fro... #5792

Closed

Conversation

harishreedharan
Copy link
Contributor

...m History Server

This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.

This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31342 has started for PR 5792 at commit 1db0655.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31341/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31342 has finished for PR 5792 at commit 1db0655.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • spark-unsafe_2.10-1.4.0-SNAPSHOT.jar

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31342/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31460 has started for PR 5792 at commit a991413.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31460 has finished for PR 5792 at commit a991413.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31460/
Test PASSed.

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31792 has started for PR 5792 at commit d824d4f.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31792 has finished for PR 5792 at commit d824d4f.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.4.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.7.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.6.4.jar
    • tachyon-client-0.6.4.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@AmplabJenkins
Copy link

Build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31792/
Test PASSed.

@@ -104,6 +108,52 @@ class HistoryServer(

initialize()

private[spark] class HistoryServlet(val appId: String) extends HttpServlet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename this to something more specific? eg., EventLogDownloadServlet?
also I think it can probably be priviate[history]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this servlet will go away when I merge this with the JSON end point change. I'm not sure whether I should add this to the JSON endpoint change or leave this servlet as is (since this is not really a JSON output, and from the current state it looks like the endpoints seem specific to returning JSON).

@squito
Copy link
Contributor

squito commented May 13, 2015

Hi @harishreedharan, I left a couple of comments, but some more high-level comments:

  • right now this lives in a separate Servlet, and is served in paths similar to the UI. Now that we have a separate servlet for the rest api, woudl it make more sense to combine it with that? Eg., add another path here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/JsonRootResource.scala#L49
    for @Path(applications/{appId}/{attemptId}/eventLogs). I think you can also get jersey to do the content-type filtering for you. This might require a little more refactoring to those utils, and perhaps some renaming (eg., JsonRootResource maybe should really be RestRootResource). Its also possible that just wouldn't work, but wanted to check.
  • can you add some tests? You should be able to add to HistoryServerSuite -- there are already some event logs available as test resources

@harishreedharan
Copy link
Contributor Author

Thanks @squito for taking a look. Yes, this lives in a separate servlet right now, as this was pre-JSON api getting pushed. From what I understand (I don't really have much idea about Jersey, so I am most likely wrong), the current code does not provide a way to return a file (the JsonRootResource seems to be able to return only JSON, as Jersey expects that). I think we could add another resource which can return files).

I will add some tests - I wanted some feedback on the approach and the API before I did.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34058 has started for PR 5792 at commit a131be6.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34057 has finished for PR 5792 at commit b422a08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Params(
    • case class Params(

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34058 has finished for PR 5792 at commit a131be6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Params(
    • case class Params(

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@squito
Copy link
Contributor

squito commented Jun 3, 2015

can you add this to the Rest API section in docs/monitoring.md? otherwise lgtm

@harishreedharan
Copy link
Contributor Author

@squito Thanks for taking a look! I just updated the docs.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34095 has started for PR 5792 at commit 221cc26.

@asfgit asfgit closed this in d2a86eb Jun 3, 2015
@rxin
Copy link
Contributor

rxin commented Jun 3, 2015

This might've broken the Hadoop 1 build

@vanzin
Copy link
Contributor

vanzin commented Jun 3, 2015

This might've broken the Hadoop 1 build

We should really add some hadoop 1 PR sanity checker if we really plan to keep on supporting it. PRs only build against hadoop 2 currently.

@rxin
Copy link
Contributor

rxin commented Jun 3, 2015

Yea that might be good to do. It's not super high cost right now though since it is caught by the master build matrix, and the frequency of it happening is low.

@harishreedharan
Copy link
Contributor Author

@rxin Can you please post a link to the broken build? I don't think any hadoop-2 specific APIs were used - either way there should be an equivalent API in Hadoop 1, if I did use a Hadoop-2 specific API

@harishreedharan
Copy link
Contributor Author

Looks like listFiles didn't exist in Hadoop-1.

@squito, @rxin - I can submit a follow up patch in a few mins.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34095 has finished for PR 5792 at commit 221cc26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

asfgit pushed a commit that referenced this pull request Jun 3, 2015
Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.

Author: Hari Shreedharan <[email protected]>

Closes #6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:

6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by #5972.
<td>Download the event logs for all attempts of the given application as a zip file</td>
</tr>
<tr>
<td><code>/applications/[app-id]/[attempt-id/logs</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this has already been merged, but there's a stray [ here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to open another PR with these changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's a missing ].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harishreedharan Yeah, we should fix this one. Sorry I wasn't able to review this in time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewor14 Here you are: #6628

@squito
Copy link
Contributor

squito commented Jun 4, 2015

thanks for taking another look and catching this stuff, sorry I missed it @andrewor14

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
… fro...

...m History Server

This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.

This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.

Author: Hari Shreedharan <[email protected]>

Closes apache#5792 from harishreedharan/eventlog-download and squashes the following commits:

221cc26 [Hari Shreedharan] Update docs with new API information.
a131be6 [Hari Shreedharan] Fix style issues.
5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
0b66948 [Hari Shreedharan] Minor formatting/import fixes.
4fc518c [Hari Shreedharan] Fix rat failures.
a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
fd6ab00 [Hari Shreedharan] Fix style issues
32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
7b362b2 [Hari Shreedharan] Almost working.
3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.

Author: Hari Shreedharan <[email protected]>

Closes apache#6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:

6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by apache#5972.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
… fro...

...m History Server

This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.

This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.

Author: Hari Shreedharan <[email protected]>

Closes apache#5792 from harishreedharan/eventlog-download and squashes the following commits:

221cc26 [Hari Shreedharan] Update docs with new API information.
a131be6 [Hari Shreedharan] Fix style issues.
5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
0b66948 [Hari Shreedharan] Minor formatting/import fixes.
4fc518c [Hari Shreedharan] Fix rat failures.
a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
fd6ab00 [Hari Shreedharan] Fix style issues
32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
7b362b2 [Hari Shreedharan] Almost working.
3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.

Author: Hari Shreedharan <[email protected]>

Closes apache#6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:

6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by apache#5972.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants