Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster MD5 and Parallel reports generation #123

Merged

Conversation

vuspenskiy
Copy link
Contributor

@vuspenskiy vuspenskiy commented Jul 7, 2018

Accelerating the coveralls file generation by using DigestInputStream and parallelizing reports' generation for source files.

For the context, we're using a monorepo and the coveralls sbt task is taking so much time that CI times out (lemurheavy/coveralls-public#1154).

…m` and parallelizing reports' generation for source files
@gslowikowski
Copy link
Member

gslowikowski commented Jul 14, 2018

Hi @vuspenskiy

How large is your repository and what are coveralls execution times?
I've checked your PR on Scala repo (large project) and coveralls execution times are between 4 and 6 seconds with and without your changes. I've measured only the time of coveralls.json file generation, without uploading to coveralls.io.

@vuspenskiy
Copy link
Contributor Author

@gslowikowski about 200Kloc in 2500 files, max file is 1300 locs.

I used the forked version in our CircleCI. Before it was timing out after 10 minutes, now is done in 43 seconds (everything in sbt ';project outParentProject;coveralls').

@gslowikowski
Copy link
Member

How large is coveralls.json file?

I will merge your PR, just want to understand what can be the root cause of so long execution times.

@vuspenskiy
Copy link
Contributor Author

Hi @gslowikowski, thank you!

I think it was about 7Mb.

@gslowikowski gslowikowski merged commit c88bc5c into scoverage:master Oct 28, 2018
@gslowikowski
Copy link
Member

Version 1.2.7 was released recently.

I'd like to add one information.

During the first run after the upgrade most users will see the information that all files have changed.

This is because the MD5 is calculated differently now. Previously every source file was loaded as a sequence of lines, then these lines were joined using new line character (\n) and MD5 was calculated. Now MD5 is calculated from the raw file content.

Where is the difference? Previously, if source file had Windows new lines they were changed to Unix ones. Additionally if there was new line at the end of the file it was lost. Now source file content is not transformed in any way before calculating MD5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants