-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diff: clean up output for changed files #2982
Comments
Another request just came #770 (comment) The request is related to data dirs - Also, it seems like data file checksums should not be shown by default. Let's prioritize human readability like in Git. An option can add checksums. The output could look like:
|
How about just using the first 7 digits, similar t a short SHA?
or the rel. path in cache
|
or nothing (unless a special flag is used), like in the latest example:
Other questions about ^ that example:
$ dvc diff -R 2 HEAD^
diff ada9d97..1a22314
0 files/dirs untouched, 2 modified, 1 added, 1 deleted.
New: 'fakefile.csv' 751 M
Removed: 'fakefile_2222.txt' 536 K
Modified: 'small.txt' 36K (+6 bytes)
Modified dir: 'images'. 116 files/dirs untouched, 7 modified, 1 added, 1 deleted.
New: 'DM3SZ.jpg` 2.1 M
Modified: 'B8PZM.jpg' 2.5 M (+158 K)
Removed: '4USKW.jpg' 2.6 M
Modified dir: 'faces'. 0 files/dirs untouched, 1 modified, 1 added, 0 deleted.
New: 'pokerface.jpg` 3.7 M
Modified dir: 'animals'. 0 files untouched, 0 modified, 100 added, 0 deleted.
Modified dir: 'faces'. 3 dirs untouched, 1 modified, 1 added, 1 deleted.
Modified dir: ...
...
$ dvc diff -R -1 HEAD^
diff ada9d97..1a22314
0 files or directories untouched, 2 modified, 1 added, 1 deleted.
|
@jorgeorpinel thank you for the questions - all are great points!
Well... possibly. But instead, I'd remove all the new lines for consistency. See
Thank you for bringing this. The problem that I see now - all these spaces as prefixes won't look good at 3rd+ level. We should think about gettig rid of the space-prefixes. It might look like:
I was thinking about the summary only for data dirs. I don't think it is needed for a regular ones and for dirs inside data dirs (see files in
👍 We might show no sizes by default and the numbers with |
My example was $ dvc diff -R -1 HEAD^
diff ada9d97..1a22314
0 files or directories untouched, 2 modified, 1 added, 1 deleted. But you also said
Which makes sense, so not sure we want to provide this after all.
|
@jorgeorpinel I got your point and Do you have any examples like this? It's okay if not. |
Ah no. I don't think |
I had a great chat with @MrOutis re this issue. I'm putting down a summary... (1) Agreed to use this-like format structure: Added:
firmograph.txt
src/dvc.ico
data/dir1/foo.txt
data/dir1/bar.csv
Modified:
source.csv
data/ <<--- note, this is a data dir and this is why it is in output.
data/lorem.txt
data/ipsum.txt
data/dir1/somefile.txt
data/dir1/file2.txt
Deleted:
dir/ <<--- note, this is a data dir and this is why it is in output.
dir/file1.txt (2) Json output should reflect this structure with "added", "modified", "deleted" on the top and list of changes. For data directories Json also needs to include (3) File sizes are not needed now. To solve the file sizes problem holistically we need to include file sizes in dvc-files (when we compute checksums) and then reuse it in other commands including diff. (4) Checksums are not needed by default. But it would be great to have an option like (5) It would be great to see a diff between HEAD and the current workspace by default (like in Git). (6) [We haven't' discussed this but it might be still important] We probably need a header like in the first line of output @MrOutis thanks for the discussion and please comment if you have something to add. |
@dmpetrov , what Git does is to show the different checksums from each file, not the trees that you are comparing. Repeating the arguments at the output seems excessive, since you already know where the diff comes from. Adding such information to the JSON output is a one-liner. Let's add it just if it is needed 🙂 |
Is this a note for us here or an actual part of the output? Not sure I get it either way. I like this proposed structure and format but this new direction for
Yes! But again, kind of like p.s. I see this latest format is consistent with #2995 (comment) too, which is nice. I like consistency 🙂 |
@jorgeorpinel good point about What we need in this functionality is If we decide to incorporate
It is the actual output. We just don't want to lose the information about data/not-data directory that might be important for UI. |
But actually I believe
I agree totally! My suggestion is that maybe we should be reviewing the behavior of
OK but I don't get the meaning of that note. Does it mean e.g. |
@jorgeorpinel I've created #3255 for discussing diff vs status and possible merge.
You are right. If anything was change inside a data dir - the dir will change and we don't want to lose this information (from output). |
Also created #3256 re the file sizes. |
In addition to this... it seems like missing cache files are not handled properly: $ git clone https://github.com/iterative/example-get-started
$ cd example-get-started
$ dvc diff
Deleted:
data/data.xml
data/features/
data/prepared/
model.pkl
files summary: 0 added, 2 deleted, 0 modified
# But there are no deleted files. Just missed caches.
$ dvc pull
$ dvc diff HEAD
# no diff - an expected result Ideally, DVC should get the difference without cache by using just metafiles. An exception is data dirs (from cache). Data dirs should be pulled automatically during |
Related to this issue - we should store file size in addition to hashes. It will give us an ability to show file sizes in Actions for
|
Just the clarify this, is the desired behavior here to The reason I'm asking is because if we get into behavior where ex if diff always pulls data dirs by default, assuming we have a dvc-tracked data directory containing
In this scenario presumably In the second |
After discussion with @dmpetrov, we decided that |
Bugs for this issue have been resolved, there is an existing separate issue for the file size related changes: #3256 |
Hi!
So are the states the same for status and for diff? Is it "not in cache" or is it "missing" (neither in cache nor in remote when -rc are used)? |
@jorgeorpinel The state for diff is "not in workspace and not in local cache". We don't check anything in remotes for diff other than trying to fetch dir cache (list of files that are supposed to be in a directory output) from a default remote (and that is only fetched if it is not already present in local cache). |
Right! I was confused. I think let's try not to mix the concept of |
Context:
Also, it would be great to simplify this output for parsing.
Possible "clean" output:
Actions:
not empty lines within the context of a single file (preferably not empty lines at all).See diff: clean up output for changed files #2982 (comment).--porcelain
EDIT 12/21/19:
The same requirements should apply to directories:
Actions:
--porcelain
(like in Git) or--to-json
to make output format easy-to-parse for scripts.EDIT 12/24/19:
The text was updated successfully, but these errors were encountered: