Skip to content

Commit

Permalink
Merge pull request #73 from xandeer/master
Browse files Browse the repository at this point in the history
fix: failed to parse headline tags like ":hello:你好:"
  • Loading branch information
rasendubi authored May 11, 2023
2 parents a74a80b + 67579ad commit dc3a84c
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 1 deletion.
5 changes: 5 additions & 0 deletions .changeset/quiet-walls-refuse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'uniorg-parse': patch
---

Fix parsing unicode characters in headline tags. The regex for parsing tags previously used `\w` (word) class, which does not behave correctly with unicode. Update it to use unicode's Letter and Number character properties instead.
25 changes: 25 additions & 0 deletions packages/uniorg-parse/src/__snapshots__/parser.spec.ts.snap
Original file line number Diff line number Diff line change
Expand Up @@ -2053,6 +2053,31 @@ children:
value: "Something"
`;
exports[`org/parser headline statistics-cookie non-ascii characters in headline tags 1`] = `
type: "org-data"
contentsBegin: 0
contentsEnd: 21
children:
- type: "section"
contentsBegin: 0
contentsEnd: 21
children:
- type: "headline"
level: 1
todoKeyword: null
priority: null
commented: false
rawValue: "headline"
tags:
- "hello"
- "你好"
contentsBegin: 2
contentsEnd: 10
children:
- type: "text"
value: "headline"
`;
exports[`org/parser headline statistics-cookie statistics cookie with long trailing space 1`] = `
type: "org-data"
contentsBegin: 0
Expand Down
5 changes: 5 additions & 0 deletions packages/uniorg-parse/src/parser.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,11 @@ describe('org/parser', () => {
`* TODO [#A] COMMENT headline /italic/ title :some:tags: [1/3]`
);

itParses(
'non-ascii characters in headline tags',
`* headline :hello:你好:`
);

itParses('statistics cookie without trailing space', `* [/]hello`);

itParses('statistics cookie with long trailing space', `* [/] hello`);
Expand Down
4 changes: 3 additions & 1 deletion packages/uniorg-parse/src/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,9 @@ class Parser {

const titleStart = this.r.offset();

const tagsM = this.r.lookingAt(/^(.*?)[ \t]+:([\w@#%:]+):[ \t]*$/m);
const tagsM = this.r.lookingAt(
/^(.*?)[ \t]+:([\p{L}\p{N}_@#%:]+):[ \t]*$/mu
);
const tags = tagsM?.[2].split(':') ?? [];
const titleEnd = tagsM
? titleStart + tagsM.index + tagsM[1].length
Expand Down

0 comments on commit dc3a84c

Please sign in to comment.