Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Trim trailing white-space from headings #1004

Merged

Conversation

AnnaKornfeldSimpson
Copy link
Contributor

Description

fix: Use header cache instead of section cache to find headings for shorter loop and greater code readability
fix: Only read and parse the file for tasks if there are listItems in Obsidian cache
refactor: Match arguments of getSection and getPrecedingHeader for greater code readability

Motivation and Context

This PR contains the re-written getPrecedingHeader that I removed from my callout and blockquote PR. It uses the Obsidian-provided heading cache to both shorten the loop (perf) and simplify the extraction of the header text (readability, maintenance).

Also contains a couple other small fixes:
Fixes #959
Readability improvement to have getSection args match getPrecedingHeader args. Note: If there is some reason that I do not know of to keep using the lengthier "pass all args as JS object then add type annotations separately" style, I can swap to using that style for both instead, but I find this style with fewer lines much easier to read.

How has this been tested?

There are no automated tests for this file.
Instead, smoke tested on Windows in up-to-date Obsidian.

Types of changes

Changes visible to users:

  • Bug fix (prefix: fix - non-breaking change which fixes an issue) - note: should not really be visible to users except as perhaps a tiny perf improvement in some large files
  • New feature (prefix: feat - non-breaking change which adds functionality)
  • Breaking change (prefix: feat!! or fix!! - fix or feature that would cause existing functionality to not work as expected)
  • Documentation (prefix: docs - improvements to any documentation content)
  • Sample vault (prefix: vault - improvements to the Tasks-Demo sample vault)

Internal changes:

  • Refactor (prefix: refactor - non-breaking change which only improves the design or structure of existing code, and making no changes to its external behaviour)
  • Tests (prefix: test - additions and improvements to unit tests and the smoke tests)
  • Infrastructure (prefix: chore - examples include GitHub Actions, issue templates)

Checklist

Terms

fix: Use header cache instead of section cache to find headings for shorter loop and greater code readability
fix: Only read and parse the file for tasks if there are listItems in Obsidian cache
refactor: Match call signatures of getSection and getPrecedingHeader for greater code readability
@claremacrae
Copy link
Collaborator

claremacrae commented Aug 9, 2022

Thanks @AnnaKornfeldSimpson ! The code is definitely more readable. I'll do the review later.

First I wanted to test the behaviour of the new code.

It's labelled as a fix, which would be a change in behaviour, but the code change looks to me like it's intending to be pure refactor - so no change in behaviour.

So I though I would see if the behaviour has changed, in my approx 3,600 file vault.

How I tested

I re-instated the old code, then ran old and new, and if they gave a difference in behaviour, I logged the difference.

diff --git a/src/Cache.ts b/src/Cache.ts
index bf492b9..7fe93e0 100644
--- a/src/Cache.ts
+++ b/src/Cache.ts
@@ -288,15 +288,30 @@ export class Cache {
                 }
 
                 const line = fileLines[listItem.position.start.line];
+
+                const precedingHeaderOld = Cache.getPrecedingHeaderOld({
+                    lineNumberTask: listItem.position.start.line,
+                    sections: fileCache.sections,
+                    fileLines,
+                });
+
+                const precedingHeaderNew = Cache.getPrecedingHeader(
+                    listItem.position.start.line,
+                    fileCache.headings,
+                );
+
+                if (precedingHeaderNew !== precedingHeaderOld) {
+                    console.log(
+                        `Header mismatch:\nOld: "${precedingHeaderOld}"\nNew: "${precedingHeaderNew}"`,
+                    );
+                }
+
                 const task = Task.fromLine({
                     line,
                     path: file.path,
                     sectionStart: currentSection.position.start.line,
                     sectionIndex,
-                    precedingHeader: Cache.getPrecedingHeader(
-                        listItem.position.start.line,
-                        fileCache.headings,
-                    ),
+                    precedingHeader: precedingHeaderNew,
                 });
 
                 if (task !== null) {
@@ -347,4 +362,45 @@ export class Cache {
         }
         return precedingHeader;
     }
+
+    private static getPrecedingHeaderOld({
+        lineNumberTask,
+        sections,
+        fileLines,
+    }: {
+        lineNumberTask: number;
+        sections: SectionCache[] | undefined;
+        fileLines: string[];
+    }): string | null {
+        if (sections === undefined) {
+            return null;
+        }
+
+        let precedingHeaderSection: SectionCache | undefined;
+        for (const section of sections) {
+            if (section.type === 'heading') {
+                if (section.position.start.line > lineNumberTask) {
+                    // Break out of the loop as the last header was the preceding one.
+                    break;
+                }
+                precedingHeaderSection = section;
+            }
+        }
+        if (precedingHeaderSection === undefined) {
+            return null;
+        }
+
+        const lineNumberPrecedingHeader =
+            precedingHeaderSection.position.start.line;
+
+        const linePrecedingHeader = fileLines[lineNumberPrecedingHeader];
+
+        const headerRegex = /^#+ +(.*)/u;
+        const headerMatch = linePrecedingHeader.match(headerRegex);
+        if (headerMatch === null) {
+            return null;
+        } else {
+            return headerMatch[1];
+        }
+    }
 }

Results

These are the 4 differences that were logged:

image
Caption: Image showing that the old code retains any spaces at the end of headings, whereas the new code appears to strip those off

Questions

  • Why do those differences occur?
  • Do they matter? For example, what impact would they have on backlinks in to headings with spaces at the end?
  • Does this actually by chance fix any un-discovered existing bugs?
  • Or does it introduce a new bug, if there is other code elsewhere in tasks that would retain the trailing space, and then give a mismatch?

@AnnaKornfeldSimpson
Copy link
Contributor Author

Fascinating! I will take a closer look. We already know that Obsidian manipulates headers to remove punctuation when making the backlinkable version, so perhaps they normalize away trailing spaces as well. I will investigate.

@claremacrae
Copy link
Collaborator

Fascinating! I will take a closer look. We already know that Obsidian manipulates headers to remove punctuation when making the backlinkable version, so perhaps they normalize away trailing spaces as well. I will investigate.

Thank you. If it does throw up any other issues, let's explore how to divide them up in to separate issues/PRs, to keep the scope of this one to just the changes it already makes...

@AnnaKornfeldSimpson
Copy link
Contributor Author

Summary: Everything still works fine with backlinks either way. This does change the result of "group by heading" if the only difference between two headings was trailing space(s).

Small tests.
File1:

# @Computer
- [ ] task1 no-header-space

File2:

# @Computer  
- [ ] task2 header-space

File3:

```tasks
description includes header-space
group by heading
```

Results:

  • Obsidian auto-suggest for link completion normalizes both headings to "Computer". No punctuation (we already knew that), no trailing space. The two headings are considered identical by Obsidian and must be in different files to get backlinked correctly.
  • Tasks backlink creation does not normalize either the punctuation or the trailing space. Obsidian still processes the backlink correctly (it still goes to the right place). Removing the trailing space does not change the correct processing of the backlink.
  • Current Tasks (before this PR) will put the tasks in two different groups since one header has a trailing space and the other does not. I personally do not think this is particularly user friendly behavior.
  • This PR will put both tasks in the same group since the trailing space is no longer stored with the header in file2. The backlinks still work correctly, without the trailing space. I like this behavior better, but I guess this could break something for users? What do you think?

@claremacrae
Copy link
Collaborator

Am slightly struggling to visualise the change in behaviour.

No hurry, but would there be any chance of screenshots of the before-and-after behaviour with that example please?

@AnnaKornfeldSimpson
Copy link
Contributor Author

AnnaKornfeldSimpson commented Aug 9, 2022

Maybe this will be easier to visualize? Same files and query as above.


Before:

@Computer

  • task2 header-space (link formatted: File2 > @Computer ) ✏️

@Computer

  • task1 no-header-space (link formatted: File1 > @Computer) ✏️

After:

@Computer

  • task1 no-header-space (link formatted: File1 > @Computer) ✏️
  • task2 header-space (link formatted: File2 > @Computer) ✏️

EDIT: still kind of difficult to "see" the spaces after the first heading in Before, but of course that's true in Obsidian too.

@claremacrae
Copy link
Collaborator

OK Thanks.

I'm sorry I don't have time to test this right now, but I wanted to capture a thought while I remember... Will test when I have time.

Obsidian auto-suggest for link completion normalizes both headings to "Computer". No punctuation (we already knew that)

I'll be interested in whether the stripping out of any punctuation might lead to unexpected missing results in heading includes searches as a result of this change?

For example, if a heading has a question mark in, and the user includes that ? in their query, will the updated code then fail to find tasks in that heading?

@AnnaKornfeldSimpson
Copy link
Contributor Author

The heading that is saved by Tasks has the punctuation. The punctuation removal is only for making links. (Check your test cases above: lots of punctuation preserved!)

@claremacrae
Copy link
Collaborator

The heading that is saved by Tasks has the punctuation. The punctuation removal is only for making links. (Check your test cases above: lots of punctuation preserved!)

Now I am really confused!!

I thought I/we were talking about how the new implementation of 'get me previous heading, via Obsidian API' in this PR differs from the previous behaviour?

And the tests don't call that code, because I don't know how to call the Obsidian API in tests?

Anyway, not to worry, I will do some empirical testing some time soon...

@AnnaKornfeldSimpson
Copy link
Contributor Author

Sorry, I should not write comments when not entirely awake.
I was referring to the testing you did on your first comment on this PR revealed that this PR does have the behavior change of removing trailing spaces in the previous heading saved by Tasks. That is the only semantic behavior change. It has consequences for "group by heading" behavior if users have some headings with trailing spaces and some not; I think they are user-friendly ones.

It also has implications for #909 where we are trying to understand Obsidian's behavior with heading / heading-link normalization and what the correct construction of backlinks is.

Backlinks do change in this PR to remove trailing spaces - this makes them closer to what Obsidian would auto-generate - both versions are parsed correctly by Obsidian so there is no behavior change there.

Copy link
Collaborator

@claremacrae claremacrae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much indeed for these changes. I learned stuff reading them!

The thing that is slightly holding me off merging now is I am struggling to word the 1-line summary of the PR and the merge commit.

As far as I understand it, the changes are:

  • perf: slightly speed up indexing tasks
  • behaviour change: remove trailing spaces from headings

I can't call the latter one a refactor, even though it simplifies code, as we found it does change behaviour. Refactoring is very strictly about not changing the visible behaviour of the code.

I can't work out to fit both of those things in to one line, for the release notes.

On reflection, I think the behaviour change is the most relevant change for users, so I will reword the PR name, and describe that as a fix.

src/Cache.ts Show resolved Hide resolved
src/Cache.ts Show resolved Hide resolved
src/Cache.ts Show resolved Hide resolved
@claremacrae claremacrae changed the title fix: Small optimizations, readability in Cache.ts fix: Trim trailing white-space from headings Aug 13, 2022
@claremacrae claremacrae merged commit 4f189cb into obsidian-tasks-group:main Aug 13, 2022
@AnnaKornfeldSimpson AnnaKornfeldSimpson deleted the use-header-cache branch August 13, 2022 17:44
AnnaKornfeldSimpson added a commit to AnnaKornfeldSimpson/fork-obsidian-tasks that referenced this pull request Aug 23, 2022
…oup#1004)

fix: Use header cache instead of section cache to find headings for shorter loop and greater code readability.

(This was mainly a refactoring, which massively simplified the code.
However, during testing, we found that the new code strips any whitespace from the end of headings, which
if there are duplicate headings differing only in trailing whitespace, would
merge the seemingly identical headings in 'group by heading'.)

refactor: Only read and parse the file for tasks if there are listItems in Obsidian cache

refactor: Match call signatures of getSection and getPrecedingHeader for greater code readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: Unneeded file read and split in Cache.ts indexFile if no lists in file
2 participants