fix: Trim trailing white-space from headings #1004

AnnaKornfeldSimpson · 2022-08-09T01:21:25Z

Description

fix: Use header cache instead of section cache to find headings for shorter loop and greater code readability
fix: Only read and parse the file for tasks if there are listItems in Obsidian cache
refactor: Match arguments of getSection and getPrecedingHeader for greater code readability

Motivation and Context

This PR contains the re-written getPrecedingHeader that I removed from my callout and blockquote PR. It uses the Obsidian-provided heading cache to both shorten the loop (perf) and simplify the extraction of the header text (readability, maintenance).

Also contains a couple other small fixes:
Fixes #959
Readability improvement to have getSection args match getPrecedingHeader args. Note: If there is some reason that I do not know of to keep using the lengthier "pass all args as JS object then add type annotations separately" style, I can swap to using that style for both instead, but I find this style with fewer lines much easier to read.

How has this been tested?

There are no automated tests for this file.
Instead, smoke tested on Windows in up-to-date Obsidian.

Types of changes

Changes visible to users:

Bug fix (prefix: fix - non-breaking change which fixes an issue) - note: should not really be visible to users except as perhaps a tiny perf improvement in some large files
New feature (prefix: feat - non-breaking change which adds functionality)
Breaking change (prefix: feat!! or fix!! - fix or feature that would cause existing functionality to not work as expected)
Documentation (prefix: docs - improvements to any documentation content)
Sample vault (prefix: vault - improvements to the Tasks-Demo sample vault)

Internal changes:

Refactor (prefix: refactor - non-breaking change which only improves the design or structure of existing code, and making no changes to its external behaviour)
Tests (prefix: test - additions and improvements to unit tests and the smoke tests)
Infrastructure (prefix: chore - examples include GitHub Actions, issue templates)

Checklist

My code follows the code style of this project and passes yarn run lint.
My change requires a change to the documentation.
I have updated the documentation accordingly.
My change has adequate Unit Test coverage.

Terms

My contribution follow this project's contributing guide
I agree to follow this project's Code of Conduct

fix: Use header cache instead of section cache to find headings for shorter loop and greater code readability fix: Only read and parse the file for tasks if there are listItems in Obsidian cache refactor: Match call signatures of getSection and getPrecedingHeader for greater code readability

claremacrae · 2022-08-09T12:38:57Z

Thanks @AnnaKornfeldSimpson ! The code is definitely more readable. I'll do the review later.

First I wanted to test the behaviour of the new code.

It's labelled as a fix, which would be a change in behaviour, but the code change looks to me like it's intending to be pure refactor - so no change in behaviour.

So I though I would see if the behaviour has changed, in my approx 3,600 file vault.

How I tested

I re-instated the old code, then ran old and new, and if they gave a difference in behaviour, I logged the difference.

diff --git a/src/Cache.ts b/src/Cache.ts
index bf492b9..7fe93e0 100644
--- a/src/Cache.ts
+++ b/src/Cache.ts
@@ -288,15 +288,30 @@ export class Cache {
                 }
 
                 const line = fileLines[listItem.position.start.line];
+
+                const precedingHeaderOld = Cache.getPrecedingHeaderOld({
+                    lineNumberTask: listItem.position.start.line,
+                    sections: fileCache.sections,
+                    fileLines,
+                });
+
+                const precedingHeaderNew = Cache.getPrecedingHeader(
+                    listItem.position.start.line,
+                    fileCache.headings,
+                );
+
+                if (precedingHeaderNew !== precedingHeaderOld) {
+                    console.log(
+                        `Header mismatch:\nOld: "${precedingHeaderOld}"\nNew: "${precedingHeaderNew}"`,
+                    );
+                }
+
                 const task = Task.fromLine({
                     line,
                     path: file.path,
                     sectionStart: currentSection.position.start.line,
                     sectionIndex,
-                    precedingHeader: Cache.getPrecedingHeader(
-                        listItem.position.start.line,
-                        fileCache.headings,
-                    ),
+                    precedingHeader: precedingHeaderNew,
                 });
 
                 if (task !== null) {
@@ -347,4 +362,45 @@ export class Cache {
         }
         return precedingHeader;
     }
+
+    private static getPrecedingHeaderOld({
+        lineNumberTask,
+        sections,
+        fileLines,
+    }: {
+        lineNumberTask: number;
+        sections: SectionCache[] | undefined;
+        fileLines: string[];
+    }): string | null {
+        if (sections === undefined) {
+            return null;
+        }
+
+        let precedingHeaderSection: SectionCache | undefined;
+        for (const section of sections) {
+            if (section.type === 'heading') {
+                if (section.position.start.line > lineNumberTask) {
+                    // Break out of the loop as the last header was the preceding one.
+                    break;
+                }
+                precedingHeaderSection = section;
+            }
+        }
+        if (precedingHeaderSection === undefined) {
+            return null;
+        }
+
+        const lineNumberPrecedingHeader =
+            precedingHeaderSection.position.start.line;
+
+        const linePrecedingHeader = fileLines[lineNumberPrecedingHeader];
+
+        const headerRegex = /^#+ +(.*)/u;
+        const headerMatch = linePrecedingHeader.match(headerRegex);
+        if (headerMatch === null) {
+            return null;
+        } else {
+            return headerMatch[1];
+        }
+    }
 }

Results

These are the 4 differences that were logged:

Caption: Image showing that the old code retains any spaces at the end of headings, whereas the new code appears to strip those off

Questions

Why do those differences occur?
Do they matter? For example, what impact would they have on backlinks in to headings with spaces at the end?
Does this actually by chance fix any un-discovered existing bugs?
Or does it introduce a new bug, if there is other code elsewhere in tasks that would retain the trailing space, and then give a mismatch?

AnnaKornfeldSimpson · 2022-08-09T15:07:46Z

Fascinating! I will take a closer look. We already know that Obsidian manipulates headers to remove punctuation when making the backlinkable version, so perhaps they normalize away trailing spaces as well. I will investigate.

claremacrae · 2022-08-09T15:18:24Z

Fascinating! I will take a closer look. We already know that Obsidian manipulates headers to remove punctuation when making the backlinkable version, so perhaps they normalize away trailing spaces as well. I will investigate.

Thank you. If it does throw up any other issues, let's explore how to divide them up in to separate issues/PRs, to keep the scope of this one to just the changes it already makes...

AnnaKornfeldSimpson · 2022-08-09T17:51:16Z

Summary: Everything still works fine with backlinks either way. This does change the result of "group by heading" if the only difference between two headings was trailing space(s).

Small tests.
File1:

# @Computer
- [ ] task1 no-header-space

File2:

# @Computer  
- [ ] task2 header-space

File3:

```tasks
description includes header-space
group by heading
```

Results:

Obsidian auto-suggest for link completion normalizes both headings to "Computer". No punctuation (we already knew that), no trailing space. The two headings are considered identical by Obsidian and must be in different files to get backlinked correctly.
Tasks backlink creation does not normalize either the punctuation or the trailing space. Obsidian still processes the backlink correctly (it still goes to the right place). Removing the trailing space does not change the correct processing of the backlink.
Current Tasks (before this PR) will put the tasks in two different groups since one header has a trailing space and the other does not. I personally do not think this is particularly user friendly behavior.
This PR will put both tasks in the same group since the trailing space is no longer stored with the header in file2. The backlinks still work correctly, without the trailing space. I like this behavior better, but I guess this could break something for users? What do you think?

claremacrae · 2022-08-09T21:28:34Z

Am slightly struggling to visualise the change in behaviour.

No hurry, but would there be any chance of screenshots of the before-and-after behaviour with that example please?

AnnaKornfeldSimpson · 2022-08-09T21:35:14Z

Maybe this will be easier to visualize? Same files and query as above.

Before:

`@Computer`

task2 header-space (link formatted: File2 > @Computer ) ✏️

`@Computer`

task1 no-header-space (link formatted: File1 > @Computer) ✏️

After:

`@Computer`

task1 no-header-space (link formatted: File1 > @Computer) ✏️
task2 header-space (link formatted: File2 > @Computer) ✏️

EDIT: still kind of difficult to "see" the spaces after the first heading in Before, but of course that's true in Obsidian too.

claremacrae · 2022-08-10T07:11:34Z

OK Thanks.

I'm sorry I don't have time to test this right now, but I wanted to capture a thought while I remember... Will test when I have time.

Obsidian auto-suggest for link completion normalizes both headings to "Computer". No punctuation (we already knew that)

I'll be interested in whether the stripping out of any punctuation might lead to unexpected missing results in heading includes searches as a result of this change?

For example, if a heading has a question mark in, and the user includes that ? in their query, will the updated code then fail to find tasks in that heading?

AnnaKornfeldSimpson · 2022-08-10T15:21:13Z

The heading that is saved by Tasks has the punctuation. The punctuation removal is only for making links. (Check your test cases above: lots of punctuation preserved!)

claremacrae · 2022-08-10T15:40:47Z

The heading that is saved by Tasks has the punctuation. The punctuation removal is only for making links. (Check your test cases above: lots of punctuation preserved!)

Now I am really confused!!

I thought I/we were talking about how the new implementation of 'get me previous heading, via Obsidian API' in this PR differs from the previous behaviour?

And the tests don't call that code, because I don't know how to call the Obsidian API in tests?

Anyway, not to worry, I will do some empirical testing some time soon...

AnnaKornfeldSimpson · 2022-08-10T16:35:02Z

Sorry, I should not write comments when not entirely awake.
I was referring to the testing you did on your first comment on this PR revealed that this PR does have the behavior change of removing trailing spaces in the previous heading saved by Tasks. That is the only semantic behavior change. It has consequences for "group by heading" behavior if users have some headings with trailing spaces and some not; I think they are user-friendly ones.

It also has implications for #909 where we are trying to understand Obsidian's behavior with heading / heading-link normalization and what the correct construction of backlinks is.

Backlinks do change in this PR to remove trailing spaces - this makes them closer to what Obsidian would auto-generate - both versions are parsed correctly by Obsidian so there is no behavior change there.

claremacrae

Thank you very much indeed for these changes. I learned stuff reading them!

The thing that is slightly holding me off merging now is I am struggling to word the 1-line summary of the PR and the merge commit.

As far as I understand it, the changes are:

perf: slightly speed up indexing tasks
behaviour change: remove trailing spaces from headings

I can't call the latter one a refactor, even though it simplifies code, as we found it does change behaviour. Refactoring is very strictly about not changing the visible behaviour of the code.

I can't work out to fit both of those things in to one line, for the release notes.

On reflection, I think the behaviour change is the most relevant change for users, so I will reword the PR name, and describe that as a fix.

src/Cache.ts

…oup#1004) fix: Use header cache instead of section cache to find headings for shorter loop and greater code readability. (This was mainly a refactoring, which massively simplified the code. However, during testing, we found that the new code strips any whitespace from the end of headings, which if there are duplicate headings differing only in trailing whitespace, would merge the seemingly identical headings in 'group by heading'.) refactor: Only read and parse the file for tasks if there are listItems in Obsidian cache refactor: Match call signatures of getSection and getPrecedingHeader for greater code readability

AnnaKornfeldSimpson mentioned this pull request Aug 9, 2022

fix: Show any markdown in group headings correctly #909

Merged

8 tasks

claremacrae approved these changes Aug 13, 2022

View reviewed changes

src/Cache.ts Show resolved Hide resolved

src/Cache.ts Show resolved Hide resolved

src/Cache.ts Show resolved Hide resolved

claremacrae changed the title ~~fix: Small optimizations, readability in Cache.ts~~ fix: Trim trailing white-space from headings Aug 13, 2022

claremacrae merged commit 4f189cb into obsidian-tasks-group:main Aug 13, 2022

claremacrae mentioned this pull request Aug 13, 2022

refactor: simplify Cache.indexFile() #1017

Merged

7 tasks

AnnaKornfeldSimpson deleted the use-header-cache branch August 13, 2022 17:44

AnnaKornfeldSimpson mentioned this pull request Aug 13, 2022

refactor: paramter style in Task.ts (except constructor) #1018

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Trim trailing white-space from headings #1004

fix: Trim trailing white-space from headings #1004

AnnaKornfeldSimpson commented Aug 9, 2022

claremacrae commented Aug 9, 2022 •

edited

Loading

AnnaKornfeldSimpson commented Aug 9, 2022

claremacrae commented Aug 9, 2022

AnnaKornfeldSimpson commented Aug 9, 2022

claremacrae commented Aug 9, 2022

AnnaKornfeldSimpson commented Aug 9, 2022 •

edited

Loading

claremacrae commented Aug 10, 2022

AnnaKornfeldSimpson commented Aug 10, 2022

claremacrae commented Aug 10, 2022

AnnaKornfeldSimpson commented Aug 10, 2022

claremacrae left a comment

fix: Trim trailing white-space from headings #1004

fix: Trim trailing white-space from headings #1004

Conversation

AnnaKornfeldSimpson commented Aug 9, 2022

Description

Motivation and Context

How has this been tested?

Types of changes

Checklist

Terms

claremacrae commented Aug 9, 2022 • edited Loading

How I tested

Results

Questions

AnnaKornfeldSimpson commented Aug 9, 2022

claremacrae commented Aug 9, 2022

AnnaKornfeldSimpson commented Aug 9, 2022

claremacrae commented Aug 9, 2022

AnnaKornfeldSimpson commented Aug 9, 2022 • edited Loading

@Computer

@Computer

@Computer

claremacrae commented Aug 10, 2022

AnnaKornfeldSimpson commented Aug 10, 2022

claremacrae commented Aug 10, 2022

AnnaKornfeldSimpson commented Aug 10, 2022

claremacrae left a comment

Choose a reason for hiding this comment

claremacrae commented Aug 9, 2022 •

edited

Loading

AnnaKornfeldSimpson commented Aug 9, 2022 •

edited

Loading

`@Computer`

`@Computer`

`@Computer`