-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add ..HEAD to since statement if defined upon publish #4448
Conversation
FWIW it does publish reasonably faster for me. Additionally I had spotted when running -l debug that it for some reason still does something to subsubdatasets which it should not even bother looking at since they were not changed thus with correct --since functioning `publish` should not have anything to do with them:$> datalad -l debug publish --since= --to=datalad-public --missing=inherit -r
...
[DEBUG ] Query repo: ['git', 'ls-files', '--stage', '-z', '-d', '-m', '--exclude-standard']
[DEBUG ] Done query repo: ['git', 'ls-files', '--stage', '-z', '-d', '-m', '--exclude-standard']
[DEBUG ] Done <AnnexRepo path=/mnt/datasets/datalad/crawl/openneuro (<class 'datalad.support.annexrepo.AnnexRepo'>)>.get_content_info(...)
[DEBUG ] Parsed version of PathRI '/mnt/datasets/datalad/crawl/openneuro/ds000001' differs from original PosixPath('/mnt/datasets/datalad/crawl/openneuro/ds000001')
[DEBUG ] Parsed version of PathRI '/mnt/datasets/datalad/crawl/openneuro/ds000002' differs from original PosixPath('/mnt/datasets/datalad/crawl/openneuro/ds000002')
[DEBUG ] Parsed version of PathRI '/mnt/datasets/datalad/crawl/openneuro/ds000003' differs from original PosixPath('/mnt/datasets/datalad/crawl/openneuro/ds000003')
[DEBUG ] Parsed version of PathRI '/mnt/datasets/datalad/crawl/openneuro/ds000005' differs from original PosixPath('/mnt/datasets/datalad/crawl/openneuro/ds000005')
... which takes notable amount of time to get through all hundreds. My wild guess is that it is probably due to |
Codecov Report
@@ Coverage Diff @@
## maint #4448 +/- ##
==========================================
+ Coverage 89.62% 89.65% +0.02%
==========================================
Files 288 288
Lines 40359 40361 +2
==========================================
+ Hits 36173 36185 +12
+ Misses 4186 4176 -10
Continue to review full report at Codecov.
|
Yeah, I think comparing to HEAD rather than the working tree is preferable here, even without considering the improved performance. FWIW
I'm not sure I get what you're trying to say here, but I'd feel okay about assuming that no one is relying on the (mis)behavior of |
Following script in the best traditions of @kyleam demonstrates it#!/bin/bash
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
datalad create src
datalad create -d src src/subds
(
cd src
datalad create-sibling -r -s target ../target
# now modify subds without saving updated state in superdataset
touch subds/file
datalad save -d subds -m "added a file" -t 0.1
datalad publish --to target -r
)
git -C target status
But then messing with this script on this branch (based on maint btw although targetting master, so added to use ssh remote) and adding `--since=`#!/bin/bash
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
datalad create src
datalad create -d src src/subds
(
cd src
datalad create-sibling -r -s target localhost:$PWD/../target
# now modify subds without saving updated state in superdataset
touch subds/file
datalad save -d subds -m "added a file" -t 0.1
datalad publish --since= --to target -r
)
git -C target status results in
and overall run failing... so I need to tune up this PR to work in those cases :-/ and probably ideally irrespectfully of |
FTR: I have no objections to anything concerning |
d'oh -- need to get back to this PR... current "performance" is simply not acceptable -- takes over 10 minutes to publish datasets.datalad.org whenever there is nothing to publish according to |
This should avoid lengthy annotatepath in heavy hierarchies while trying to look for possible changes in the wortree. I think this would not be entirelly correct in case if subdataset is progressed forward but change was not recorded in the superdaset yet, but it might work as desired (publish modified in its forwarded on remote end as well) which might be current behavior as well anyways (did not tesT)
those |
I have contributed to existing test -- it should not affect the tested below logic
… effects we used to not support paths for create-sibling, that is why used ssh. No longer needed, and might speed up running the test in some scenarios and AFAIK now should work on Windows (did not try though yet)
a02a188
to
2a3abc0
Compare
This should avoid lengthy annotatepath in heavy hierarchies while trying to
look for possible changes in the wortree.
I think this would not be entirelly correct in case if subdataset is
progressed forward but change was not recorded in the superdaset yet, but it
might work as desired (publish modified in its forwarded on remote end as well)
which might be current behavior as well anyways (did not test).
Might be that it Closes #4446 ;-)