Some `text` node has not `position` property #16

azu · 2021-05-12T03:19:08Z

Subject of the issue

When parsing some text, the text node has not position property.

Your environment

OS: macOs 10.15.7
Packages:

    "remark-gfm": "^1.0.0",
    "remark-parse": "^9.0.0",
    "unified": "^9.2.1"

Env:

Node: 14.16.1 - ~/.volta/tools/image/node/14.16.1/bin/node
Yarn: 1.22.10 - ~/.volta/tools/image/yarn/1.22.10/bin/yarn
npm: 6.14.12 - ~/.volta/tools/image/node/14.16.1/bin/npm

Steps to reproduce

Reproducing repo: https://github.com/azu/remark-gfm-no-position-bug

import unified from "unified";
import remarkGfm from "remark-gfm";
import remarkParse from "remark-parse";

const remark = unified().use(remarkParse).use(remarkGfm);
const ast = remark.parse(`http&#x3A;//user:password@host:port/path?key=value#fragment`);
console.log(JSON.stringify(ast, null, 4));

output:

{
    "type": "root",
    "children": [
        {
            "type": "paragraph",
            "children": [
                {
                    "type": "text",
                    "value": "http://user:password@host:port/path?key=value#fragment"
                }
            ],
            "position": {
                "start": {
                    "line": 1,
                    "column": 1,
                    "offset": 0
                },
                "end": {
                    "line": 1,
                    "column": 60,
                    "offset": 59
                }
            }
        }
    ],
    "position": {
        "start": {
            "line": 1,
            "column": 1,
            "offset": 0
        },
        "end": {
            "line": 1,
            "column": 60,
            "offset": 59
        }
    }
}

Expected behavior

The text node should have position property.

{
    "type": "root",
    "children": [
        {
            "type": "paragraph",
            "children": [
                {
                    "type": "text",
                    "value": "http://user:password@host:port/path?key=value#fragment",
                    "position": {
                        "start": {
                            "line": 1,
                            "column": 1,
                            "offset": 0
                        },
                        "end": {
                            "line": 1,
                            "column": 60,
                            "offset": 59
                        }
                    }
                }
            ],
            "position": {
                "start": {
                    "line": 1,
                    "column": 1,
                    "offset": 0
                },
                "end": {
                    "line": 1,
                    "column": 60,
                    "offset": 59
                }
            }
        }
    ],
    "position": {
        "start": {
            "line": 1,
            "column": 1,
            "offset": 0
        },
        "end": {
            "line": 1,
            "column": 60,
            "offset": 59
        }
    }
}

Actual behavior

The text node has not position property.

{
    "type": "root",
    "children": [
        {
            "type": "paragraph",
            "children": [
                {
                    "type": "text",
                    "value": "http://user:password@host:port/path?key=value#fragment"
                }
            ],
            "position": {
                "start": {
                    "line": 1,
                    "column": 1,
                    "offset": 0
                },
                "end": {
                    "line": 1,
                    "column": 60,
                    "offset": 59
                }
            }
        }
    ],
    "position": {
        "start": {
            "line": 1,
            "column": 1,
            "offset": 0
        },
        "end": {
            "line": 1,
            "column": 60,
            "offset": 59
        }
    }
}

Escaping + http:// is something wrong:
http://a

The text was updated successfully, but these errors were encountered:

wooorm · 2021-05-14T12:02:29Z

Unfortunately we can’t add them.

GH’s uses two algorithms to add autolinks (see this readme and some of the other issues there). One at parse time, another at “AST” time.

The one at parse time nicely adds positional info. The one at AST-time can’t infer the original position of certain URLs from text nodes, because the AST only has the starting and ending point, but no knowledge in this case:

- > a block quote in a list
    with https://example.com a URL

But, the fact that http://user:password@host:port/path?key=value#fragment is seen as a URL is incorrect 🤔

http&#x3A;//user:password@host:port/path?key=value#fragment
http://user:password@host:port/path?key=value#fragment

http://user:password@host:port/path?key=value#fragment
http://user:password@host:port/path?key=value#fragment

^-- it shouldn’t link

azu · 2021-05-14T22:05:30Z

However, It seems that gfm plugin can infer the position when parsing valid http://~ URL string.

import unified from "unified";
import remarkGfm from "remark-gfm";
import remarkParse from "remark-parse";

const remark = unified().use(remarkParse).use(remarkGfm);
const ast = remark.parse(`This is https://example.com`);
console.log(JSON.stringify(ast, null, 4));

to

{
    "type": "root",
    "children": [
        {
            "type": "paragraph",
            "children": [
                {
                    "type": "text",
                    "value": "This is ",
                    "position": {
                        "start": {
                            "line": 1,
                            "column": 1,
                            "offset": 0
                        },
                        "end": {
                            "line": 1,
                            "column": 9,
                            "offset": 8
                        }
                    }
                },
                {
                    "type": "link",
                    "title": null,
                    "url": "https://example.com",
                    "children": [
                        {
                            "type": "text",
                            "value": "https://example.com",
                            "position": {
                                "start": {
                                    "line": 1,
                                    "column": 9,
                                    "offset": 8
                                },
                                "end": {
                                    "line": 1,
                                    "column": 28,
                                    "offset": 27
                                }
                            }
                        }
                    ],
                    "position": {
                        "start": {
                            "line": 1,
                            "column": 9,
                            "offset": 8
                        },
                        "end": {
                            "line": 1,
                            "column": 28,
                            "offset": 27
                        }
                    }
                }
            ],
            "position": {
                "start": {
                    "line": 1,
                    "column": 1,
                    "offset": 0
                },
                "end": {
                    "line": 1,
                    "column": 28,
                    "offset": 27
                }
            }
        }
    ],
    "position": {
        "start": {
            "line": 1,
            "column": 1,
            "offset": 0
        },
        "end": {
            "line": 1,
            "column": 28,
            "offset": 27
        }
    }
}

I think that it is a bug related to http://.

result of list + blockquote + autolink

import unified from "unified";
import remarkGfm from "remark-gfm";
import remarkParse from "remark-parse";

const remark = unified().use(remarkParse).use(remarkGfm);
const ast = remark.parse(`- > a block quote in a list
    with https://example.com a URL`);
console.log(JSON.stringify(ast, null, 4));

to

{
    "type": "root",
    "children": [
        {
            "type": "list",
            "ordered": false,
            "start": null,
            "spread": false,
            "children": [
                {
                    "type": "listItem",
                    "spread": false,
                    "checked": null,
                    "children": [
                        {
                            "type": "blockquote",
                            "children": [
                                {
                                    "type": "paragraph",
                                    "children": [
                                        {
                                            "type": "text",
                                            "value": "a block quote in a list",
                                            "position": {
                                                "start": {
                                                    "line": 1,
                                                    "column": 5,
                                                    "offset": 4
                                                },
                                                "end": {
                                                    "line": 1,
                                                    "column": 28,
                                                    "offset": 27
                                                }
                                            }
                                        }
                                    ],
                                    "position": {
                                        "start": {
                                            "line": 1,
                                            "column": 5,
                                            "offset": 4
                                        },
                                        "end": {
                                            "line": 1,
                                            "column": 28,
                                            "offset": 27
                                        }
                                    }
                                }
                            ],
                            "position": {
                                "start": {
                                    "line": 1,
                                    "column": 3,
                                    "offset": 2
                                },
                                "end": {
                                    "line": 1,
                                    "column": 28,
                                    "offset": 27
                                }
                            }
                        },
                        {
                            "type": "paragraph",
                            "children": [
                                {
                                    "type": "text",
                                    "value": "with ",
                                    "position": {
                                        "start": {
                                            "line": 2,
                                            "column": 5,
                                            "offset": 32
                                        },
                                        "end": {
                                            "line": 2,
                                            "column": 10,
                                            "offset": 37
                                        }
                                    }
                                },
                                {
                                    "type": "link",
                                    "title": null,
                                    "url": "https://example.com",
                                    "children": [
                                        {
                                            "type": "text",
                                            "value": "https://example.com",
                                            "position": {
                                                "start": {
                                                    "line": 2,
                                                    "column": 10,
                                                    "offset": 37
                                                },
                                                "end": {
                                                    "line": 2,
                                                    "column": 29,
                                                    "offset": 56
                                                }
                                            }
                                        }
                                    ],
                                    "position": {
                                        "start": {
                                            "line": 2,
                                            "column": 10,
                                            "offset": 37
                                        },
                                        "end": {
                                            "line": 2,
                                            "column": 29,
                                            "offset": 56
                                        }
                                    }
                                },
                                {
                                    "type": "text",
                                    "value": " a URL",
                                    "position": {
                                        "start": {
                                            "line": 2,
                                            "column": 29,
                                            "offset": 56
                                        },
                                        "end": {
                                            "line": 2,
                                            "column": 35,
                                            "offset": 62
                                        }
                                    }
                                }
                            ],
                            "position": {
                                "start": {
                                    "line": 2,
                                    "column": 5,
                                    "offset": 32
                                },
                                "end": {
                                    "line": 2,
                                    "column": 35,
                                    "offset": 62
                                }
                            }
                        }
                    ],
                    "position": {
                        "start": {
                            "line": 1,
                            "column": 1,
                            "offset": 0
                        },
                        "end": {
                            "line": 2,
                            "column": 35,
                            "offset": 62
                        }
                    }
                }
            ],
            "position": {
                "start": {
                    "line": 1,
                    "column": 1,
                    "offset": 0
                },
                "end": {
                    "line": 2,
                    "column": 35,
                    "offset": 62
                }
            }
        }
    ],
    "position": {
        "start": {
            "line": 1,
            "column": 1,
            "offset": 0
        },
        "end": {
            "line": 2,
            "column": 35,
            "offset": 62
        }
    }
}

wooorm · 2021-05-18T16:13:49Z

However, It seems that gfm plugin can infer the position when parsing valid http://~ URL string.

This is exactly what I described in my previous comment!

I think that it is a bug related to http://.

I think it’s acceptable, because it’s impossible to solve.
GH algorithm 2 works on the AST, and in that phase it’s impossible to know the original positional info

The list + blockquote example was to illustrate, here is an actual working example: https://runkit.com/embed/tyohiws7jhem. There is no positional info on the link, because it has to use algorithm 2, and it would be hard to figure out from the AST information where that link was originally.

There is a bug: http://... should not turn into a link.

azu · 2021-05-20T01:26:00Z

Thanks for the details.

The one at AST-time can’t infer the original position of certain URLs from text nodes, because the AST only has the starting and ending point, but no knowledge in this case

Does it mean that transforming AST to AST?
Previously, I've written a similar AST to AST parser.
This parser walks AST and parses the node's value for getting text offset. The offset is based on the node's starting point.

https://github.com/azu/sentence-splitter/blob/1663ab26ba4ad29d9ab36616a572231a5e9d637c/src/sentence-splitter.ts#L182-L224

I expected that all node has position property.
If some node has not position, we need to always add if(!node.position) { return; /* can not handle */ } statement.
I feel it is an invalid node because it is the same text type but some nodes have not position property.
So I want to remove the invalid node. However, removing the node and will break another one.

Actually, remark@12 has position property on this example.
https://runkit.com/azu/60a5b5b63d9183001a1c5670

Is there a workaround for avoiding creating non-postion node?
It is blocker to update remark-parser, and I need to investigate it.

azu · 2021-05-22T04:21:33Z

I have looked into this.
I think that it is possible, but the current missing some parts for implementation.

We may need following changes to transformGfmAutolinkLiterals.
This is pseudo-code diff for https://github.com/syntax-tree/mdast-util-gfm-autolink-literal/blob/515f945feda9ec57d53ab09dfd458a86e9ff8eb8/from-markdown.js#L81-L86

diff --git a/from-markdown.js b/from-markdown.js
index 8a24bcc..cc02eab 100644
--- a/from-markdown.js
+++ b/from-markdown.js
@@ -82,7 +82,22 @@ function findUrl($0, protocol, domain, path, match) {
     type: 'link',
     title: null,
     url: prefix + protocol + parts[0],
-    children: [{type: 'text', value: protocol + parts[0]}]
+    children: [{
+      type: 'text',
+      value: protocol + parts[0],
+      position: {
+        start: {
+          line: calcLineFromOffset(node.position.start.offset + match.index),
+          column: calcColumnFromOffset(node.position.start.offset + match.index),
+          offset: node.position.start.offset + match.index
+        },
+        end: {
+          line: calcLineFromOffset(node.position.start.offset + match.index + $0.length),
+          column: calcColumnFromOffset(node.position.start.offset + match.index + $0.length),
+          offset: node.position.start.offset + match.index + $0.length
+        }
+      }
+    }]
   }
 
   if (parts[1]) {

Lack of implementation:

mast-util-find-and-replace does not pass the matched node to replace function
- It will needed to infer start offset correctly.
- This is node in pseudo-code
mdast-util-from-markdown does not pass raw text(parsing text) to MdastExtension
- It will needed to calculate line and column from offset
  - structured-source is example of this logics implementation
- ~~Or, get raw text from the node.~~
- This is calcLineFromOffset and calcColumnFromOffset in pseudo-code

azu · 2021-05-22T06:05:26Z

📝 Workaround: disable mdast-util-gfm-autolink-literal transform.

import unified from "unified";
// @ts-ignore
import autolinkLiteral from "mdast-util-gfm-autolink-literal/from-markdown";
// disable autolink transforms
autolinkLiteral.transforms = [];
import remarkGfm from "remark-gfm";
import remarkParse from "remark-parse";
const remark = unified().use(remarkParse).use(remarkGfm);
const ast = remark.parse(`http&#x3A;//user:password@host:port/path?key=value#fragment`);
console.log(JSON.stringify(ast, null, 4));

workaround for remarkjs/remark-gfm#16

* feat(markdown-to-ast): update remark-parse@9 * test: update snapshots * fix: support footnote [^1] again * fix: create range from offset * fix: treat BOM text * chore: rename footnoteReference to FootnoteReference * test: add broken example * test: add broken example * fix(markdown-to-ast): add a workaround to fix broken node workaround for remarkjs/remark-gfm#16

wooorm · 2021-05-22T15:18:45Z

Does it mean that transforming AST to AST?
Previously, I've written a similar AST to AST parser.
This parser walks AST and parses the node's value for getting text offset. The offset is based on the > node's starting point.

Given this markdown:

- > a block quote in a list
    with https://example.com a URL

The text node has this value: a block quote in a list\nwith https://example.com a URL.

It starts at 1:5, and ends at 2:35.

The markdown could look like this:

- > a block quote in a list
with https://example.com a URL

This:

- > a block quote in a list
                                  with https://example.com a URL

This:

:::some custom markdown structure
    a block quote in a list
    with https://example.com a URL
:::

How do you know where the link starts and ends?

I expected that all node has position property.
If some node has not position, we need to always add if(!node.position) { return; /* can not handle > */ } statement.
I feel it is an invalid node because it is the same text type but some nodes have not position property.
So I want to remove the invalid node. However, removing the node and will break another one.

The position field on nodes is optional: https://github.com/syntax-tree/unist#node.
It is valid, and it is heavily used when doing more complex tasks. For example, when processing MDX, the input is markdown-like (remark), which goes through to HTML (rehype), and then compiles to JavaScript (recma).
Several nodes in this process are added (one example), without positional information, because they were not in the original document.

Actually, remark@12 has position property on this example.

Yes it did, but it also didn’t match GFM. To properly match GFM, the two algorithms need to be used: one at parse time, one at “AST time”.

wooorm · 2021-08-15T11:18:54Z

I fixed what seemed like the bug.
However, unfortunately I don’t see a way to “solve” this issue (text nodes w/o position) :(

azu · 2021-08-15T11:49:58Z

Thanks

However, unfortunately I don’t see a way to “solve” this issue (text nodes w/o position) :(

Would it be possible to provide a option that disable mdast-util-gfm-autolink-literal?
https://github.com/remarkjs/remark-gfm#unifieduseremarkgfm-options
If remark-gfm has the option, we can avoid to use a workaround.
(📝 Probably, this workaround will not work with Node.js exports filed)

wooorm · 2021-08-15T12:55:40Z

I don’t see a reason for an option to turn all of mdast-util-gfm-autolink-literal off in mdast-util-gfm or remark-gfm. The whole point of these is that they match GFM. And the separate packages can be used and matched however you please (see § When to use this).

I somewhat see an option to turn algorithm2 off, which is what you do in the workaround, in mdast-util-gfm-autolink-literal, which would be a major (because the API has to change to accept options), but bubble through in mdast-util-gfm and remark-gfm as a minor release.

The reason is that algorithm2 does not work in comments, and thus that option would somewhat match some flavor of GFM. But not 100%, because email links do work in comments: https://gist.github.com/wooorm/076fd173c31ba6837f17591d5932476e.
so this option would not match comments.

I’m not convinced yet of such an option. @ChristianMurphy?
And if it would be added: a question would be how to name it.

(and I think your workaround will work fine with ESM!)

ChristianMurphy · 2021-08-15T19:28:04Z

I don't really see the option as making much sense. 🤔
*-gfm plugins should, to the best of their ability, match GFM, having an option for "be less like GFM" seems to go against its goal. 🤔

Taking a step back, position on Node is optional. This is far from the only case where it can happen. I'd be curious to hear how it's being used, and discussing what alternative(s) exist for handling these cases.

azu added 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels May 12, 2021

azu mentioned this issue May 12, 2021

feat(markdown-to-ast): update to remark-parse@9 textlint/textlint#767

Merged

4 tasks

azu changed the title ~~Some text node has not position property~~ Auto link's text node has not position property May 14, 2021

azu changed the title ~~Auto link's text node has not position property~~ Some text node has not position property May 14, 2021

azu mentioned this issue May 20, 2021

Update to [email protected] textlint/textlint#717

Closed

azu added a commit to textlint/textlint that referenced this issue May 22, 2021

fix(markdown-to-ast): add a workaround to fix broken node

c99218e

workaround for remarkjs/remark-gfm#16

azu mentioned this issue May 22, 2021

[Meta] textlint 12.0.0 textlint/textlint#738

Closed

5 tasks

wooorm mentioned this issue Aug 15, 2021

Links detected in raw broken markdown block compared to GitHub.com syntax-tree/mdast-util-gfm-autolink-literal#4

Closed

4 tasks

wooorm closed this as completed in syntax-tree/mdast-util-gfm-autolink-literal@b02cdaf Aug 15, 2021

This comment has been minimized.

Sign in to view

wooorm added 🙅 no/wontfix This is not (enough of) an issue for this project and removed 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Aug 15, 2021

github-actions bot added the 👎 phase/no Post cannot or will not be acted on label Aug 15, 2021

ChristianMurphy mentioned this issue Jun 8, 2022

Source position property not available for some autolinks remarkjs/remark#1001

Closed

4 tasks

azu mentioned this issue Sep 20, 2022

Missing dependency from @textlint/markdown-to-ast to mdast-util-gfm-autolink-literal textlint/textlint#903

Closed

wooorm mentioned this issue Oct 9, 2022

Parse www. won't return positions syntax-tree/mdast-util-gfm-autolink-literal#6

Closed

4 tasks

yuzhanglong mentioned this issue Nov 5, 2022

fix: 解析 GitHub Markdown 语法时部分节点缺少 position lint-md/lint-md#93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some `text` node has not `position` property #16

Some `text` node has not `position` property #16

azu commented May 12, 2021 •

edited

Loading

wooorm commented May 14, 2021

azu commented May 14, 2021 •

edited

Loading

wooorm commented May 18, 2021 •

edited

Loading

azu commented May 20, 2021 •

edited

Loading

azu commented May 22, 2021 •

edited

Loading

azu commented May 22, 2021

wooorm commented May 22, 2021

This comment has been minimized.

wooorm commented Aug 15, 2021

azu commented Aug 15, 2021 •

edited

Loading

wooorm commented Aug 15, 2021 •

edited

Loading

ChristianMurphy commented Aug 15, 2021 •

edited

Loading

Some text node has not position property #16

Some text node has not position property #16

Comments

azu commented May 12, 2021 • edited Loading

Subject of the issue

Your environment

Steps to reproduce

Expected behavior

Actual behavior

wooorm commented May 14, 2021

azu commented May 14, 2021 • edited Loading

wooorm commented May 18, 2021 • edited Loading

azu commented May 20, 2021 • edited Loading

azu commented May 22, 2021 • edited Loading

azu commented May 22, 2021

wooorm commented May 22, 2021

This comment has been minimized.

wooorm commented Aug 15, 2021

azu commented Aug 15, 2021 • edited Loading

wooorm commented Aug 15, 2021 • edited Loading

ChristianMurphy commented Aug 15, 2021 • edited Loading

Some `text` node has not `position` property #16

Some `text` node has not `position` property #16

azu commented May 12, 2021 •

edited

Loading

azu commented May 14, 2021 •

edited

Loading

wooorm commented May 18, 2021 •

edited

Loading

azu commented May 20, 2021 •

edited

Loading

azu commented May 22, 2021 •

edited

Loading

azu commented Aug 15, 2021 •

edited

Loading

wooorm commented Aug 15, 2021 •

edited

Loading

ChristianMurphy commented Aug 15, 2021 •

edited

Loading