-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show git diff signs in gutter #1623
Conversation
247db5d
to
eafdc69
Compare
Hi folks, I stumbled upon this PR by accident. As it happens, I implemented the exact same feature for the Micro editor a while ago. Here's how it looks: Two things from Micro's implementation work quite nicely and might be worth considering for Helix as well:
You can try out the implementation by installing and running Micro, pressing Ctrl+E and entering the command Anyway, that's all. I absolutely love what you are building here. Have a great day! |
eafdc69
to
1c9e486
Compare
Hi @p-e-w, thanks for reaching out ! The upper one eighth block looks perfect for deletion and I've replaced the previous symbol with it. Since the discussion on the the placement of diff gutter also kind of got stalled, I went with the left aligned blocks for the added and modified markers with the gutter itself placed to the left most edge, essentially mimicking your screenshot :) I also like the idea of doing the diffing in the editor itself, especially if it can make update-as-you-type fast enough to be implemented. Currently it's updated only on save, which makes for a sub-optimal experience IMO. We already pull in the |
We can do the diffing, but you'd still need a way to query for the original file. |
f732123
to
5a9fc33
Compare
The diff is now updated live, on every change. The main problem left to tackle is to bypass converting the document Rope to a string every time the diff needs to be updated (so practically on most keypresses). |
c96c98c
to
0cd310b
Compare
Diffing avoids string conversion now by wrapping a Rope, implemented in 0cd310b 🎉 |
|
||
impl<'a> RopeLine<'a> { | ||
pub fn from_rope(rope: &'a Rope) -> Vec<Self> { | ||
rope.lines().into_iter().map(RopeLine).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: .lines()
is not very efficient: cessen/ropey#25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I read O(1)
in the chunks iterator and assumed lines would also follow suit. @cessen how hard/big of a change would it be to make the lines iterator also O(1) ? I suppose if it was easy enough it would've already been added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's non-trivial, for sure. I have a pretty good idea how to go about it, but it amounts to a rewrite of the lines iterator, and the code will be pretty subtle. I do still want to get to it eventually, but it's definitely a pain, which is why I've put it off for so long, ha ha.
Are you running into performance issues with the current lines iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From casual use it seems snappy and I haven't noticed any performance issues, so this is probably not too serious of an issue :) But then again this is a hotpath since it's called on almost every keystroke, and I haven't run any actual benchmarks yet either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then again this is a hotpath since it's called on almost every keystroke
Synchronously?
Thinking about it more, I would expect the diffing itself to be expensive enough that if the Lines
iterator is causing performance issues then the diffing would be as well (i.e. even if Lines
was infinitely fast, you'd still have issues). So if this feature does turn out to cause performance problems (e.g. on large files or with large diffs), then rearchitecting it to work in a separate thread or something like that might be needed anyway.
EDIT: I see you're deadlining the diffing process. See my other comment.
}; | ||
|
||
let timeout = Duration::from_millis(250); | ||
let diff = similar::capture_diff_slices_deadline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading your code correctly, it looks like you're collecting all the lines into a Vec
in order to pass them to capture_diff_slices_deadline()
, so even though this has a deadline set you can still be bottle necked on collecting everything into a Vec
.
Instead, you could use capture_diff_deadline()
by wrapping the Rope
in a type that implements Index
and returns the lines through indexing. That way the entire procedure can be deadlined.
I could also make this even easier for you by implementing Hash
on RopeSlice
(which is really just an accidental omission on my part in Ropey).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with Index
is that it requires the Output
to be a reference, whereas each line is returned as a RopeSlice
. This might depend on rust-lang/generic-associated-types-initiative#2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Right, I feel silly now. Yeah, that's unfortunate. I wonder if they'd accept a pull request to add a variant that takes a |usize| -> Item
closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only diff using similar
initially and now use changesets to incrementally compute the diffs, so it's no longer executed on every keypress. It'll still be nice to optimize it using indexing since we don't want to stall on huge text files on startup.
Thinking about this some more, it seems to me that we could compute changed ranges based on helix changesets (after maybe doing an initial diff against git?). We already have all the information needed to incrementally update the diff in the transaction rather than having to go through git and diffing full files every single time (these files could also be very large). |
helix-vcs/src/rope.rs
Outdated
impl Hash for RopeLine<'_> { | ||
fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
for chunk in self.0.chunks() { | ||
chunk.hash(state); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now implemented Hash
directly on Rope
and RopeSlice
(in the recently released Ropey 1.4.0), so this should be unnecessary now.
It's also worth noting that your implementation here is incorrect (not to criticize, just an interesting technical note), which I only noticed when diving into the trait to implement it myself:
&str
's Hash
implementation, which you're calling in the loop here, is purposefully implemented so that hashing a series of string slices yields a different hash than hashing the same data as a single concatenated string. Which means that if two Rope
s have identical contents, but the chunks happen to be split up differently, your implementation will give two different hashes for them.
Instead, you actually need to call state.write(chunk.as_bytes())
in the inner loop.
The reason &str
's Hash
implementation works like that is because of #[derive(Hash)]
. Imagine this struct:
#[derive(Hash)]
struct Person<'a> {
first_name: &'a str,
last_name: &'a str,
}
If concatenated strings gave the same hash, then someone named "Jon Athanbor" and "Jonathan Bor" would hash to the same value if the names were stored in lower case. (A contrived example, I know, but it's just to illustrate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah usually you prefix each chunk with a hash of the len
to solve for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't know this; thanks for pointing it out ! Thanks for the Hash impl too :)
What about changes at a distance, like adding a "use std::path::Path;" with rust-analyzer from a use point hundreds of line down the line ? Is the information for those available ? |
Yes, all changes are always applied through a changeset/transaction. |
Nice ! |
0cd310b
to
2de80fa
Compare
2de80fa
to
f7bcd3f
Compare
let reverted_tx = transaction.invert(&old_doc); | ||
|
||
// Consider H to be the text of the file in git HEAD, and B₁ to be | ||
// the text when buffer initially loads. H → B₁ is the changeset | ||
// that describes the diff between HEAD and buffer text. Inverting | ||
// this produces B₁ → H, which is initially saved to `diff_changes`. | ||
// In the next edit, buffer text changes to B₂. The transaction | ||
// describes the change B₁ → B₂. Inverting this gives B₂ → B₁. | ||
// Composing this with the saved `diff_changes` gives us | ||
// (B₂ → B₁) → (B₁ → H) = B₂ → H. Whenever examining a change X₁ → X₂, | ||
// we need to know the contents of the text at state X₁ to know where | ||
// to apply the operations in the changeset. The advantage of inverting and | ||
// composing this way instead of simply composing (which would give | ||
// us H → B₂ in this example) is that we no longer need the HEAD text | ||
// and we can instead use the current text in the buffer. | ||
if let Some(changes) = self.diff_changes.take() { | ||
let reverted_changes = reverted_tx.changes().clone(); | ||
let changes = reverted_changes.compose(changes); | ||
self.diff_changes = Some(changes); | ||
self.diff_with_base(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use changesets now, and use the approach described in the comment. It also has the advantage that we don't have to manage a cache of the file contents in HEAD since we don't need it to generate the diff.
This works fine for most of the cases except for deleting something and undoing it (there are some other rough corners, but they can be easily solved at the rendering level). Normally, composing a change and it's undo operation produces an empty changeset. But since we use inverted changes here, it doesn't apply cleanly.
I'll write a better explanation of the problem tomorrow, but it boils down to the fact that composing for example [Delete(5)]
and [Insert("hello")]
(note the order) does not produce an empty changeset, where as composing an insert and then a delete gives a changeset of []
.
Relevant test case:
Patch for the test
:100644 100644 daf4a77e 00000000 M helix-core/src/transaction.rs
diff --git a/helix-core/src/transaction.rs b/helix-core/src/transaction.rs
index daf4a77e..9aed26e8 100644
--- a/helix-core/src/transaction.rs
+++ b/helix-core/src/transaction.rs
@@ -579,6 +579,17 @@ mod test {
use super::*;
use crate::State;
+ #[test]
+ fn composing_delete_and_insert_leaves_empty_changeset() {
+ let rope = Rope::from("");
+ let mut delete = ChangeSet::new(&rope);
+ delete.delete(5);
+ let mut insert = ChangeSet::new(&rope);
+ insert.insert("hello".into());
+
+ let changes = delete.compose(insert);
+ assert_eq!(changes.changes, &[]);
+ }
#[test]
fn composition() {
use Operation::*;
fn composing_delete_and_insert_leaves_empty_changeset() {
let rope = Rope::from("");
let mut delete = ChangeSet::new(&rope);
delete.delete(5);
let mut insert = ChangeSet::new(&rope);
insert.insert("hello".into());
let changes = delete.compose(insert);
assert_eq!(changes.changes, &[]);
}
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
|
||
[dependencies] | ||
git2 = { version = "0.13", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really want to use the c bindings here instead of something that is written in rust? Like gix but not sure if they support diff yet. It seemed to have the stuff we need from a quick look https://github.com/Byron/gitoxide/blob/main/git-repository/src/types.rs
|
||
#[derive(Debug, Default)] | ||
pub struct Registry { | ||
inner: HashMap<RepoRoot, Rc<RefCell<Git>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's a good idea to have this as Git
given that it is called vcs, I thought it is supposed to be extensible to other vcs. Also, I wonder why is it Rc<RefCell
? So we can edit?
pub fn set_version_control(&mut self, vcs: Option<Rc<RefCell<helix_vcs::Git>>>) { | ||
self.version_control = vcs; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just do version_control_mut
for this?
let changes = compare_ropes(diff_base, self.text()).changes().clone(); | ||
let changes = changes.invert(diff_base); | ||
self.diff_changes = Some(changes); | ||
drop(vcs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please leave a comment on the drop here?
it would be great if this could be easily toggled - it's a useful feature, but it's visually distracting to see it all the time. |
Closing in favor of #3890. Thanks @pascalkuthe for picking it up! |
Shows the changes to the current file symbolically in the gutter:
It looks a bit too cramped with diagnostics, so maybe the reserved space should be increased but that will make the gutter permanently too big (there's currently no way to "toggle" a gutter component). Changes are reflected only when the file is saved. I would like to show an accurate diff even if the file is unsaved, but this should be a good starting point.
The symbols are hard coded for now to keep the PR simple, and a follow up PR will make it configurable. The default might also need changing (though I would like to avoid too much bike shedding ;)
Supersedes #467.