-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Use tika to detect content type more accurately #303
Conversation
@jglick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
652K (no transitive deps) so will slow down
Line 123 in 71934b1
Map<String, String> contentTypes = workspace.act(new ContentTypeGuesser(new ArrayList<>(artifacts.keySet()), listener)); |
Cloud
s). Not sure it is worth the cost to get more correct content types in some cases; would prefer for Files.probeContentType
to be improved in the JDK. Did you have a particular content type that the plugin currently fails to detect properly?
My ".log" file and ".json" files are all in "binary/octet-stream" type. I noticed that it's MasterToSlaveFileCallable so the classes are sent to the agent. One solution is to create another ContentTypeGuesser using tika implementation and make this an option on the UI side What do you think about it? |
Ouch, Prefer to avoid cluttering the UI with obscure options. A possible compromise: adjust |
I change the code that only uses tika when JRE can't detect the content type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Would you mind adding something to
Lines 292 to 303 in 728e7db
p.setDefinition(new CpsFlowDefinition("node('remote') {writeFile file: 'f.txt', text: '" + text + "'; writeFile file: 'f.html', text: '" + html + "'; writeFile file: 'f', text: '\\u0000'; archiveArtifacts 'f*'}", true)); | |
j.buildAndAssertSuccess(p); | |
WebResponse response = j.createWebClient().goTo("job/p/1/artifact/f.txt", null).getWebResponse(); | |
assertThat(response.getContentAsString(), equalTo(text)); | |
assertThat(response.getContentType(), equalTo("text/plain")); | |
response = j.createWebClient().goTo("job/p/1/artifact/f.html", null).getWebResponse(); | |
assertThat(response.getContentAsString(), equalTo(html)); | |
assertThat(response.getContentType(), equalTo("text/html")); | |
response = j.createWebClient().goTo("job/p/1/artifact/f", null).getWebResponse(); | |
assertThat(response.getContentLength(), equalTo(1L)); | |
assertThat(response.getContentType(), containsString("/octet-stream")); |
*.json
or whatever now gets detected properly?
src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
Outdated
Show resolved
Hide resolved
src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
Outdated
Show resolved
Hide resolved
(BTW for purposes of review it is nicer to push follow-on commits, as rebasing breaks incremental review. Does not matter much in this case since the diff is so short to begin with.) |
Move Tika to a separate util class to avoid repeatedly initializing. Add test to check the correct json content type
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, thanks!
Use tika to detect content type more accurately.

I perform a manual test that the content type in s3 is expected.