Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow revision #45

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

MichaelXavier
Copy link

This is to further the discussion we were having in #33

I merged in upstream master so this should merge cleanly. You were asking if its possible for git ls-remote could be used to clean this up. I think that's for phase 2 of what I described in the original issue. The way I see it, this gets psc-package operating as-documented. Right now it says you can specify a tag or SHA. In reality, it lets you specify a branch or tag but not a SHA. It seems like supporting a branch may be a misfeature because it is not a fixed single commit but rather a moving target, yet psc-package basically assumes its a fixed target leading to some confusion.

The correct final solution here is to forbid branches, which involves some sort of git incantation to tell us what type a tree-ish is and giving a nice error message if its a branch explaining that psc-package targets must be immutable. I also suspect that once we do that, we can bring back the "exists" checks to do fewer downloads. Its totally ok if you don't want to do this half-measure, it was just the simplest solution I knew how to implement right now and solved a problem I was having.

This could be a half-measure for purescript#33. This does not prevent usage of
master, but it arguably does behave how you'd expect: it always keeps
whatever reference you're using as a package set up to date, and
happens to also allow you to use a SHA in addition to a revision. I'm
going to give this a spin, but if we later wanted to disallow
branches, we could probably cobble together a git command that would
tell us if a tree-ish or whatever they call it is a branch or not.
view actually dumps the exit status, sh does not.
@paf31
Copy link
Contributor

paf31 commented Jul 27, 2017

Sorry for the delay, it's been a busy week.

This seems like it could decrease performance for repos with many tags. Is there no simpler way to pull a single SHA hash from a remote repo?

@MichaelXavier
Copy link
Author

Sorry for the delay. That's a valid concern if I understand it correctly. I think we could avoid ever redundantly fetching if we could treat the reference as immutable, which SHAs and tags should be. So I see a few paths forward:

  1. Document that this only supports SHA and tag. Behavior of branches is undefined. Then we keep the fetch mechanism but if the destination exists, do not fetch. That way you pay the requests once and when you add a new dep or change the SHA/tag.
  2. Do 1 but also somehow try to protect the user from themselves by identifying that the token they gave is not a SHA or tag but is a branch and throw an error. I do not know how to do this with git or if its even possible. For instance, is it possible for a tag to have the same name as a branch?

@Pauan
Copy link

Pauan commented Aug 3, 2017

@MichaelXavier Why wouldn't git ls-remote work? It displays data in this format:

12d92467ae6ecb13957bb1ab5ea07670bd985c35	refs/heads/branch1
cb1979308583e8420cead50e6b524d90353e0a84	refs/heads/branch2
17d415ab1e30fb8b9002de4bac4d9e543ebca7e9	refs/heads/branch3
9d9000d30ab9b5a5a51f9d2e38c135d4fd2c75b6	refs/tags/0.5.1
d0944a8ae00ed44dfb0e382b372d8ef82841b496	refs/tags/0.6.0
e4d308e25d9f6752ee4b2b817c69296672e191b5	refs/tags/0.7.0

So you should be able to do something like this:

git ls-remote --refs '<GIT URL HERE>' '<GIT TAG OR COMMIT HERE>'

The above command only prints the branches/tags which exactly match <GIT TAG OR COMMIT HERE>

You can even specify multiple tags/commits if you want.

So all you have to do is verify that there aren't any refs/heads/, and that there is exactly zero or one refs/tags/

And it also helpfully gives you the commit for the tag, so you can use that for additional robustness (e.g. if the tag changes while downloading the repo it will still checkout the correct commit).

There might be an easier or faster way of checking, but I think using git ls-remote should work.

@Pauan
Copy link

Pauan commented Aug 3, 2017

As for downloading a specific commit, it seems that's not possible.

The best you can do seems to be:

git clone --no-checkout '<GIT URL HERE>' foo
cd foo
git checkout '<GIT COMMIT HERE>'

So by using git ls-remote you can tell whether it's a branch/tag/commit, then:

  • Error if it's a branch

  • Use the fast clone if it's a tag

  • Use the slow clone + checkout (the above code) if it's an SHA

@Pauan
Copy link

Pauan commented Aug 3, 2017

For GitHub in particular you can efficiently download a specific commit by downloading this URL:

https://github.com/<USERNAME>/<PACKAGE>/archive/<COMMIT>.zip

e.g. https://github.com/purescript/psc-package/archive/e9c81dc641ad845714fca792a48f3de2345f62c3.zip

But that only works for GitHub, not Git repos in general.

@Pauan
Copy link

Pauan commented Aug 3, 2017

Oh, you can also use .tar.gz rather than .zip

For reference, Nix uses the .tag.gz URL to fetch from GitHub.

Nix does a fetch + checkout to fetch from non-GitHub git repos. But they also do a lot of other complex stuff too.

Personally, I recommend downloading the .tar.gz from GitHub when the URL is https://github.com, and otherwise fall back to clone + checkout

@paf31
Copy link
Contributor

paf31 commented Aug 3, 2017

Yes, I'd definitely like to use that path to pull things from GitHub, but we can deal with that separately. I like the idea of using ls-remote to solve this issue.

@MichaelXavier
Copy link
Author

@Pauan thanks for weighing in. I knew someone with a lot of deep git knowledge would have a solution. I am not sure when I'll have time to work on this again, probably wednesday but @Pauan if you have a workable solution and some free time until then, feel free to fixup my patch. Basically I'm just going to be implementing the commends you recommended anyway.

@chexxor
Copy link

chexxor commented Aug 7, 2017

I discovered git archive last week, which I wish we could use. While GitLab supports it, looks like GitHub and Bitbucket does not. :( - https://www.gilesorr.com/blog/git-archive-github.html

Here is a list of "tree-ish" things that it should accept.

$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz <tree-ish>
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz v3.1.0
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz 6d411827970302769895877511f51f7a414ecbb3

This is for purescript#45. We'll see what the tests say.

Summary:
* I use a newtype for Repo since things were starting to get
complicated differentiating it from tags. Hope that's ok.
* I use the ls-remote trick (you're a lifesaver, @Pauan!) to pull
  refs. If the ref is *proven* to be a branch, bail out with an
  error. If its *proven* to be a tag, we can shortcut the clone
  without running a checkout afterwards. Otherwise, assume its a SHA.
* Restore the "exists" checks since we have decided to treat SHAs and
  tags as immutable. This should avoid unnecessary network calls on
  subsequent runs.
app/Main.hs Outdated
let refs = Set.fromList (mapMaybe parseRef remoteLines)
if Set.member rawAsBranch refs
then do
echoT (raw <> " is a branch. psc-package only supports tags and SHAs.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of disallowing this outright, I'm thinking we might want to put it behind a flag. What do you think?

We'd probably need to clone the repo into a directory named after the SHA hash of the branch or something though. Otherwise the files on disk could get out of sync with the remote branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disallowing branching was basically to allow us to not have to check every time. SHA and tag are both considered fixed commits whereas branches can change at any time. It just doesn't seem to me that the argument for a branch is worth the cost, but I could be convinced otherwise.

If we did stick to just SHA and tag then I don't think we'd have to bother with special folder naming because if you pulled it once, you're done.

app/Main.hs Outdated
where
rawAsBranch = "refs/heads/" <> raw
rawAsTag = "refs/tags/" <> raw
gitProc = inproc "git" ["ls-remote", "-q", "--refs", from] empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inproc here? Don't we use proc elsewhere? Is there any important difference?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc seems to return an ExitCode. inproc returns a ByteString which I need in order to parse the output.

app/Main.hs Outdated
, sha
] empty .||. exit (ExitFailure 1)
where
inGitRepo m = (sh (pushd into >> m))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: redundant parens. Also can you please indent this past the where (or use a let)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I personally like the indented style but I've seen both in the file. I'll just go through and standardize in this file.

app/Main.hs Outdated
] empty .||. exit (ExitFailure 1)
cloneShallow (Repo from) (CloneSHA sha) into = do
void $ proc "git"
[ "clone"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well pull this common prefix out into a function or constant since the two are so similar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is done on the branch now.

@paf31
Copy link
Contributor

paf31 commented Aug 17, 2017

Looks good, thanks for the updates.

I do still have a slight concern that this will add a performance overhead, since running ls-remote on a repo with many tags does seems to take a short while. The compiler repo has many SHA hashes for example:

$ time git ls-remote -q --refs | wc -l
    1687

real	0m0.862s
user	0m0.014s
sys	0m0.016s

Perhaps what we can do is to only look at tags and heads (with -t -h), and assume the input is a SHA if it's not in either of those lists. I assume branches will show up in the output if we use -h.

@Pauan
Copy link

Pauan commented Aug 17, 2017

@paf31 I didn't realize that ls-remote pulls in other things like pull requests as well.

Using git ls-remote --heads --tags --refs sounds like a good idea, even though it's the same speed.

As per @Paun's suggestion as it returns less extraneous data this way.
@MichaelXavier
Copy link
Author

@Pauan I've implemented your suggestion. Thanks!

@paf31 I think I've addressed the all the issues in your review. Let me know if there's any other concerns.

@MichaelXavier
Copy link
Author

@paf31 just bumping this to see if you needed anything else.

] empty .||. exit (ExitFailure 1)
-> IO ()
cloneShallow (Repo from) tgt into = do
void $ proc "git"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try this with a SHA hash, I get fatal: Remote branch <sha> not found in upstream origin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we confirm this works by now?

@paf31
Copy link
Contributor

paf31 commented Sep 1, 2017

The code looks great, thanks.

I'd also like to try using git clone -b ... and falling back to a clone and checkout if the clone fails. I'll look into that now.

@paf31
Copy link
Contributor

paf31 commented Sep 2, 2017

Another option, which allows us to avoid the costly ls-remote lookup, and which I actually prefer, would be to allow URLs like sha://0123456789abcdef or tag://psc-0.11.6 wherever we currently allow a tag. It's explicit, and it doesn't have to be a breaking change.

What do you both think?

Edit: to be clear, I'm suggesting we look for the prefix sha:// before cloning. Then:

  • If it's there, clone the whole repo and checkout the named revision. Verify it actually refers to a commit and not a branch/tag.
  • If not, use git clone -b, and then verify that we didn't clone a branch.

@hdgarrood
Copy link
Contributor

I think it would make more sense to just use sha: or tag: without slashes if we are making up some URL schemes, since the stuff which follows isn't hierarchical. See https://tools.ietf.org/html/rfc2718#section-2.1.2

@paf31
Copy link
Contributor

paf31 commented Sep 20, 2017

I also just found out about git describe which might be useful here.

@paf31
Copy link
Contributor

paf31 commented Nov 19, 2017

Sorry for the long delay. I think we should try to pick this back up.

Since we're making breaking changes anyway, I think we should just use the URL approach, and aim to disallow branches completely.

I'm onboard for the URI scheme thing, but it seems kind of weird that we're still falling back to detecting to a slow method.

Which do you think is the slow method? I think what we use right now should still be possible for tags, no? Only sha:// URLs would be slow since we'd need to do a full clone.

@MichaelXavier
Copy link
Author

I think we're aligned. I think I can move forward on this when I have time:

  1. I'll support sha:// and tag://. I like the specificity of those labels and the fact that it becomes clear what is and isn't supported.
  2. To be clear, since we're breaking, we're going to require these schemes to be specified?

@paf31
Copy link
Contributor

paf31 commented Nov 23, 2017

I would be fine with that (and yes, we'd need to update the package set), but I'd also be fine with just looking for sha:// and assuming it's a tag if we don't see that prefix (in which case, the package set is fine as it is).

In response to recent comments in purescript#45. Now we don't need an expensive
ls-remote call to determine if a clone target is a tag or sha. I'm
still lacking a good way to test this out though.
@MichaelXavier
Copy link
Author

@paf31 I threw up a quick implementation, swapping out the function which calls git ls-remote for one that implements the behavior we described: checks the scheme in the reference for sha or tag and failing that, assumes tag. Do you have any ideas on how this should be tested?

@nwolverson
Copy link
Contributor

What's the status on this now? Looks like this was in a good place, but I don't think @hdgarrood 's point on the URL format was taken into account - sha: rather than sha://. As : is not permitted in refs this is still unambiguous.

This is in accordance to the comments in purescript#45
@MichaelXavier
Copy link
Author

@nwolverson Just pushed an update. I think I glossed over the part about removing the // before.

@MichaelXavier
Copy link
Author

I know there were some changes in who is working on this repo. Any chance of getting this merged? I don't know what is going on with appveyor but it looks unrelated to the changes.

@justinwoo
Copy link
Collaborator

If there are no objections, we should really try to merge this soon, especially since the existing behavior leads to unpleasant surprises.

else case T.toLower schemeName of
"sha" -> Right (CloneSHA withoutScheme)
"tag" -> Right (CloneTag withoutScheme)
_ -> Left ("Invalid scheme. Expected sha: | tag: but got " <> schemeName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should probably allow using tags by default and warn about it. Thoughts? cc @kritzcreek

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. I could see adding a case for "" -> Right (CloneTag withoutScheme). But if they affirmatively provided a scheme that's some unknown, it seems like we should still error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've read all of the code instead of just this bit :D I see now that we only hit this case if the target contained a : in the first place.

else case T.toLower schemeName of
"sha" -> Right (CloneSHA withoutScheme)
"tag" -> Right (CloneTag withoutScheme)
_ -> Left ("Invalid scheme. Expected sha: | tag: but got " <> schemeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've read all of the code instead of just this bit :D I see now that we only hit this case if the target contained a : in the first place.

] empty .||. exit (ExitFailure 1)
-> IO ()
cloneShallow (Repo from) tgt into = do
void $ proc "git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we confirm this works by now?

app/Main.hs Outdated
_ -> Left ("Invalid scheme. Expected sha: | tag: but got " <> schemeName)
where
(schemeName, remainder) = T.breakOn ":" t
withoutScheme = T.drop 3 remainder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to use stripPrefix "://" here instead? I could see some very hard to debug behaviour resulting from malformed URLs otherwise.

https://www.stackage.org/haddock/lts-10.5/text-1.2.2.2/Data-Text.html#v:stripPrefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should actually just be T.drop 1 here, since the format is now sha:27492fbb19484..., without the slashes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kritzcreek stripPrefix won't work because there's content before the "://", we're trying to extract the scheme beforehand. If we want to be really paranoid about parsing the URI, we could pull in uri-bytestring, but that is probably overkill.

@hdgarrood I think you're right. Probably should have console tested this since we don't really have a formal test suite for this project.

@MichaelXavier
Copy link
Author

I don't know why it doesn't let me comment on the testing. I'm pretty sure I ran the commands on a terminal when I originally wrote it. Considering that @hdgarrood pointed out what appears to be a text parsing bug and the travis tests passed, I'm wondering if the test suite is insufficient as is. I feel unprepared to really thoroughly test this PR. I'm trying to figure a good point to hand this work off.

The drop call was not updated when we went from "scheme://identifier"
to "scheme:identifier" and was eating 2 characters from the identifier
when a scheme was specified.
@MichaelXavier
Copy link
Author

@hdgarrood Just confirmed the bug you spotted with the drop, pushed an update fixing it.

@MichaelXavier
Copy link
Author

Bumping this again. Possibly tagging @justinwoo

@justinwoo
Copy link
Collaborator

Needs some conflict fixing, but otherwise this is good right? Anyone in disagreement?

Copy link
Contributor

@hdgarrood hdgarrood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichaelXavier
Copy link
Author

Just pushed a merge commit with master. The only conflict was it looks like there was a BowerInfoRepo newtype that wrapped Text and I made it to wrap Repo. The FromJSON instance for that got delegated via generalized newtype deriving so that was all I had to change.

Copy link
Collaborator

@justinwoo justinwoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs more testing or something around cloneShallow, as I can't quite get everything running naively when it should be using the default tag-checkout

@MichaelXavier
Copy link
Author

Talked this over with @justinwoo. I think we're on the same page as far as the idea of this PR:

  1. Allow revisions to be specified in addition to just tags.
  2. So long as psc-package assumes the dependency's revision is immutable, it is generally not sensible to use moving targets like the latest tag or a branch. The workaround for users is to blow away the psc-package cache which is a bad user experience. I would argue and I hope everyone would agree that we should prevent this surprising behavior. You can always extract a revision from a branch at a point in time so there's always a way to get to the correct behavior.

However, I haven't really tracked the progress of this project over the 8 months this PR has been open and don't have a good test plan to ensure this continues to work. Sounded like @justinwoo was willing to take this change across the finish line so I'll leave it to him from here.

Rembane pushed a commit to Rembane/psc-package that referenced this pull request May 26, 2018
* Add lens and mathbox

* Add freeap

* Add flare

* Add aui
@paulyoung
Copy link

Would love to see this land sometime soon. Thanks for all the work so far.

@justinwoo
Copy link
Collaborator

So I did some more digging into this issue and what can be done to fetch repos, but it seems that there's actually no real way to fetch repositories by a commit SHA reliably, being that this flag must be set on the server: git/git@68ee628

  • This means it's not as easy as "git fetch --depth 1 "

  • This does not seem to work with older repositories like slamdata/purescript-halogen.

  • Other tools like Nix will initialize an empty repo, fetch all refs from the remote, and then checkout the SHA from the fetched history.

  • Tools like Nix-Prefetch-GitHub will use the GitHub API to get information about a commit and get a tarball of the contents. This only works for GitHub.

I would personally like to close this and document these findings somewhere clearly so that this feature doesn't get suggested over and over again.

@justinwoo
Copy link
Collaborator

Sorry, I did not remember that this solution works by shallow cloning, so it's already "expensive".

@paulyoung
Copy link

I didn’t realize this but apparently NPM only supports commit hashes for GitHub too.

I think supporting this only for GitHub would be better than not at all.

@JordanMartinez
Copy link

Should we close this PR? I haven't read through this PR's content and what it's trying to change. However, psc-package is generally no longer used by the community as most use spago now, though there may be some who still use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants