Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise s3 cache probing #97

Merged
merged 1 commit into from
Jul 27, 2017
Merged

Parallelise s3 cache probing #97

merged 1 commit into from
Jul 27, 2017

Conversation

mheinzel
Copy link
Contributor

Just gave #79 a try.

Disclaimer: I don't have a Carthage dev environment, so this it completely untested (but it typechecks 😉 ).
Please try if it works as intended before merging!

Commit 1 just parallelises the requests with minimal changes to existing code using async-lifted.
async in itself is a really nice library, but it wants the parallelised actions to be IO actions and since we have a ReaderT s IO here, we can't use it. async-lifted provides a more general interface using MonadBaseControl which we can use.

Commit 2 makes the code more concise, removing one top level function. I find this version to be more understandable (less passing around of long arguments), but that might be different for other people.

Commit 3 avoids MonadBaseControl and uses plain async, but needs to do some additional gymnastics to run the ReaderT.

By deciding which commits to pick, you can choose the version you want ;)
Minimal changes, more concise, avoiding async-lifted...

@hlintBot
Copy link

hlintBot commented Jul 27, 2017

2 Warnings
⚠️ src/Lib.hs#L534 - Found Reduce duplication

1frameworkNameWithFrameworkExtension = appendFrameworkExtensionTo f
2platformBuildDirectory = carthageBuildDirectoryForPlatform platform
3frameworkDirectory
4  = platformBuildDirectory </> frameworkNameWithFrameworkExtension
5

Why Not

1Combine with src/Lib.hs:959:5
⚠️ src/Lib.hs#L534 - Found Reduce duplication

 1frameworkNameWithFrameworkExtension = appendFrameworkExtensionTo f
 2platformBuildDirectory = carthageBuildDirectoryForPlatform platform
 3frameworkDirectory
 4  = platformBuildDirectory </> frameworkNameWithFrameworkExtension
 5dSYMNameWithDSYMExtension
 6  = frameworkNameWithFrameworkExtension ++ ".dSYM"
 7dSYMdirectory
 8  = platformBuildDirectory </> dSYMNameWithDSYMExtension
 9

Why Not

1Combine with src/Lib.hs:582:5

Generated by 🚫 Danger

@tmspzz tmspzz self-assigned this Jul 27, 2017
@tmspzz
Copy link
Owner

tmspzz commented Jul 27, 2017

@mheinzel Great work! Thanks a lot!

I think I like commit 0097856 the most.

I'm very tempted by f1978ca as well because it deletes some code. However the pattern so far has been:

  • write a function to do a thing once
  • write a wrapper to do it many times

and this commit would break the pattern.

It's debatable if this philosophy was a good approach to begin with, the intention was to remove the cognitive load of the maps and flatMaps and leave the code more readable. It certainly helped me when I started. I'm not sure it's making the code easier to read now. What do you think?

Commit 4040f43 imho is less readable, introduces do, and runs the reader. I have no problem with async-lifted, should I have a problem with that? :D

@tmspzz
Copy link
Owner

tmspzz commented Jul 27, 2017

I tried it out and seems to work to me.

The speed up is pretty impressive. It's almost instant.

This can probably be done for downloading and uploading at well but I wonder if it makes sense to split the bandwidth. Also there is printing when downloading/uploading operations take place, probably output on console will be garbled and out of order. Something to consider.

@mheinzel
Copy link
Contributor Author

I don't think the pattern is bad, but in this situation it seems a bit too fine-grained.

Also, what makes it so verbose here is that there are so many arguments you just need to pass through, but have to mention again and again (and the type signatures get pretty long).
To reduce this, you could bundle some of this information into some kind Env data type and put it all into the Reader. Bucket name, inverted repo map, AWS env, verbosity settings and cache prefix string (once obtained in the beginning) should not change during execution of the program, right?

Regarding async-lifted:
It has some additional complexity with MonadBaseControl, but you don't really have to deal with this concept to understand and work with the code.
One disadvantage might be that once we convert the code to mtl, we would need a MonadBaseControl constraint in the type signature, which would bubble up into functions using it 🤔
But no problem for now, we can still decide to explicitly run the Reader then...

@mheinzel
Copy link
Contributor Author

By the way, should upload and download of artifacst be parallelised as well?

@tmspzz
Copy link
Owner

tmspzz commented Jul 27, 2017

I suggest to proceed with the following in this PR:

leave the rest out (for now).

I think the Env Data structure going forward would be a nice solution for the verbosity of the arg passing. Some of the arguments are already hidden in the reader.

Probably as you suggested time ago since there is a lot of ReaderT combinations this approach can be used: https://hackernoon.com/the-has-type-class-pattern-ca12adab70ae

Also, don't know if relevant but there is this: https://www.fpcomplete.com/blog/2017/07/the-rio-monad

Once the Env is there then probably it makes sense to take a look again and maybe crush the wrappers that just map over stuff.

About uploading/downloading of artifacts:
It would be nice to parallelise those as well but I wonder:

  • Is there any benefit in splitting the bandwidth in N? what do you think?
  • There are print statements all over the place and I fear those might overlap in the output on console. This does not happen when listing because the outcome is collected in a data structure (clever move by @r-peck )

If you want to take a stab at this it would be great. Obviously that would be a different PR.

@mheinzel
Copy link
Contributor Author

Sure, let's just merge the first commit.

Downloading multiple files concurrently could be faster in some situations, but not as much as in this case. The perfect number of concurrent downloads depends on a lot of factors...
We could just try it at some point, but it's not urgent.

@tmspzz
Copy link
Owner

tmspzz commented Jul 27, 2017

@mheinzel will you push force just the first commit or should we do this?

@tmspzz tmspzz merged commit c195fe6 into tmspzz:master Jul 27, 2017
@tmspzz
Copy link
Owner

tmspzz commented Jul 27, 2017

merged! 🎉

@mheinzel mheinzel deleted the parallel-requests branch July 27, 2017 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants