Parallelise s3 cache probing #97

mheinzel · 2017-07-27T08:55:27Z

Just gave #79 a try.

Disclaimer: I don't have a Carthage dev environment, so this it completely untested (but it typechecks 😉 ).
Please try if it works as intended before merging!

Commit 1 just parallelises the requests with minimal changes to existing code using async-lifted.
async in itself is a really nice library, but it wants the parallelised actions to be IO actions and since we have a ReaderT s IO here, we can't use it. async-lifted provides a more general interface using MonadBaseControl which we can use.

Commit 2 makes the code more concise, removing one top level function. I find this version to be more understandable (less passing around of long arguments), but that might be different for other people.

Commit 3 avoids MonadBaseControl and uses plain async, but needs to do some additional gymnastics to run the ReaderT.

By deciding which commits to pick, you can choose the version you want ;)
Minimal changes, more concise, avoiding async-lifted...

hlintBot · 2017-07-27T08:57:22Z

2 Warnings

⚠️

src/Lib.hs#L534 - Found Reduce duplication

1frameworkNameWithFrameworkExtension = appendFrameworkExtensionTo f
2platformBuildDirectory = carthageBuildDirectoryForPlatform platform
3frameworkDirectory
4  = platformBuildDirectory </> frameworkNameWithFrameworkExtension
5

Why Not

1Combine with src/Lib.hs:959:5

⚠️

src/Lib.hs#L534 - Found Reduce duplication

 1frameworkNameWithFrameworkExtension = appendFrameworkExtensionTo f
 2platformBuildDirectory = carthageBuildDirectoryForPlatform platform
 3frameworkDirectory
 4  = platformBuildDirectory </> frameworkNameWithFrameworkExtension
 5dSYMNameWithDSYMExtension
 6  = frameworkNameWithFrameworkExtension ++ ".dSYM"
 7dSYMdirectory
 8  = platformBuildDirectory </> dSYMNameWithDSYMExtension
 9

Why Not

1Combine with src/Lib.hs:582:5

Generated by 🚫 Danger

tmspzz · 2017-07-27T09:23:10Z

@mheinzel Great work! Thanks a lot!

I think I like commit 0097856 the most.

I'm very tempted by f1978ca as well because it deletes some code. However the pattern so far has been:

write a function to do a thing once
write a wrapper to do it many times

and this commit would break the pattern.

It's debatable if this philosophy was a good approach to begin with, the intention was to remove the cognitive load of the maps and flatMaps and leave the code more readable. It certainly helped me when I started. I'm not sure it's making the code easier to read now. What do you think?

Commit 4040f43 imho is less readable, introduces do, and runs the reader. I have no problem with async-lifted, should I have a problem with that? :D

tmspzz · 2017-07-27T10:12:13Z

I tried it out and seems to work to me.

The speed up is pretty impressive. It's almost instant.

This can probably be done for downloading and uploading at well but I wonder if it makes sense to split the bandwidth. Also there is printing when downloading/uploading operations take place, probably output on console will be garbled and out of order. Something to consider.

mheinzel · 2017-07-27T10:41:44Z

I don't think the pattern is bad, but in this situation it seems a bit too fine-grained.

Also, what makes it so verbose here is that there are so many arguments you just need to pass through, but have to mention again and again (and the type signatures get pretty long).
To reduce this, you could bundle some of this information into some kind Env data type and put it all into the Reader. Bucket name, inverted repo map, AWS env, verbosity settings and cache prefix string (once obtained in the beginning) should not change during execution of the program, right?

Regarding async-lifted:
It has some additional complexity with MonadBaseControl, but you don't really have to deal with this concept to understand and work with the code.
One disadvantage might be that once we convert the code to mtl, we would need a MonadBaseControl constraint in the type signature, which would bubble up into functions using it 🤔
But no problem for now, we can still decide to explicitly run the Reader then...

mheinzel · 2017-07-27T10:42:43Z

By the way, should upload and download of artifacst be parallelised as well?

tmspzz · 2017-07-27T11:03:46Z

I suggest to proceed with the following in this PR:

Merge 0097856

leave the rest out (for now).

I think the Env Data structure going forward would be a nice solution for the verbosity of the arg passing. Some of the arguments are already hidden in the reader.

Probably as you suggested time ago since there is a lot of ReaderT combinations this approach can be used: https://hackernoon.com/the-has-type-class-pattern-ca12adab70ae

Also, don't know if relevant but there is this: https://www.fpcomplete.com/blog/2017/07/the-rio-monad

Once the Env is there then probably it makes sense to take a look again and maybe crush the wrappers that just map over stuff.

About uploading/downloading of artifacts:
It would be nice to parallelise those as well but I wonder:

Is there any benefit in splitting the bandwidth in N? what do you think?
There are print statements all over the place and I fear those might overlap in the output on console. This does not happen when listing because the outcome is collected in a data structure (clever move by @r-peck )

If you want to take a stab at this it would be great. Obviously that would be a different PR.

mheinzel · 2017-07-27T12:15:58Z

Sure, let's just merge the first commit.

Downloading multiple files concurrently could be faster in some situations, but not as much as in this case. The perfect number of concurrent downloads depends on a lot of factors...
We could just try it at some point, but it's not urgent.

tmspzz · 2017-07-27T12:54:17Z

@mheinzel will you push force just the first commit or should we do this?

tmspzz · 2017-07-27T13:27:00Z

merged! 🎉

Parallelise S3 cache probing

0097856

tmspzz self-assigned this Jul 27, 2017

tmspzz approved these changes Jul 27, 2017

View reviewed changes

tmspzz merged commit c195fe6 into tmspzz:master Jul 27, 2017

mheinzel deleted the parallel-requests branch July 27, 2017 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelise s3 cache probing #97

Parallelise s3 cache probing #97

mheinzel commented Jul 27, 2017

hlintBot commented Jul 27, 2017 •

edited

Loading

tmspzz commented Jul 27, 2017 •

edited

Loading

tmspzz commented Jul 27, 2017 •

edited

Loading

mheinzel commented Jul 27, 2017

mheinzel commented Jul 27, 2017

tmspzz commented Jul 27, 2017

mheinzel commented Jul 27, 2017

tmspzz commented Jul 27, 2017

tmspzz commented Jul 27, 2017

Parallelise s3 cache probing #97

Parallelise s3 cache probing #97

Conversation

mheinzel commented Jul 27, 2017

hlintBot commented Jul 27, 2017 • edited Loading

tmspzz commented Jul 27, 2017 • edited Loading

tmspzz commented Jul 27, 2017 • edited Loading

mheinzel commented Jul 27, 2017

mheinzel commented Jul 27, 2017

tmspzz commented Jul 27, 2017

mheinzel commented Jul 27, 2017

tmspzz commented Jul 27, 2017

tmspzz commented Jul 27, 2017

hlintBot commented Jul 27, 2017 •

edited

Loading

tmspzz commented Jul 27, 2017 •

edited

Loading

tmspzz commented Jul 27, 2017 •

edited

Loading