-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚧[Experimenting]: Refactoring marked js🚧 #1102
Conversation
I really don't trust our test bed to accurately reflect the desired behavior and use cases, to be honest. |
After the lexer is run, the parser goes next, the InlineLexer is not used by Marked or the Lexer, only seems to be the Parser.
@intcreator: Interesting findings and notes in here, might help. Maybe we should slow down on trying to fix issues and work on making this thing more readable...?? Or, I'll do that, you do you, and we'll meet in the middle?? |
It looks like the diff got messed up so it's really hard for me to tell what's going on still. As far as fixing vs. making the code legible, I vote we get the builds passing (just fixing #1092) then we can focus on understanding the code more. It's just way easier to check if things are still okay if builds are passing. Edit: I see the builds are passing now so let's work on figuring out what the code does. This may be a good opportunity for creating some unit tests as well. Edit 2: Okay, the diffs are messed up because you're reorganizing the layout of the code. Cool. I'm kind of liking the idea someone had earlier to have different things in different files. I think unit tests will make it easy to see what's going on more clearly too. I'll look at it a bit when I have time and make comments where appropriate. |
* @return string | ||
*/ | ||
function marked(src, opt, callback) { | ||
// ^^ Can't we do something like marked(src, opt = {}, callback = function()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default parameters are ES6 according to MDN. Maybe in 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ES5 is gonna probably start to annoy me soon. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can find a transpiler for 1.0.
var highlight = opt.highlight, | ||
tokens, | ||
pending, | ||
i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to MDN, the comma operator evaluates each operand and returns the value of the last operand. Here, highlight = opt.highlight
, tokens
, pending
, and i = 0
are all being evaluated and 0
is being returned (and not being used).
This is a somewhat obscure way of declaring tokens
, pending
, and i
as variables in addition to highlight
. Here is a reference on Stack Overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concur with the assessment of the commenter. Wonder if we can add that to the lint (@UziTech)? Don't do this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I hate this style, but that is how chjj did it. I prefer initializing the variables as close to when you use them as possible.
In this case since the =
operator has a higher presedence than the ,
operator the expression is essentially:
var (highlight = opt.highlight),
(tokens),
(pending),
(i = 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, I don't know who necessarily did it that way, just that I think it should be changed. Totally agree that putting things as close to their contextual relevance is kind of a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There actually seems to be a lot of "clever" decisions that were made, which may not actually bring the desired ROI compared to writing human readable code. We spend way more time reading than we do writing, you know?
@intcreator: On the diff, don't know if this is what you're referring to exactly, but the moving |
// | ||
// TODO: Don't some highlighter plugins allow for `js`, effectively being less than three | ||
// despite wanting to have the highlighter run?? | ||
if (!highlight || highlight.length < 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are better tests than the length of the highlight. Agree with the possibility of js
. Will we be supporting syntax highlighting of languages such as c
or r
? Highlighting doesn't appear to be in the CommonMark spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. See #1043
} | ||
|
||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is almost 100 lines long. I highly recommend we break it down into pieces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging #1043 so UziTech can get a better feel on internal pull for the updates to the test harness request. No pressure. Just informational purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant #1103
* @return boolean | ||
*/ | ||
marked.receivedOnlySource = function(src, opt, callback) { | ||
return (marked.receivedCallback(opt, callback)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be return !(marked.receivedCallback(opt, callback));
only source means we did not receive those other things.
Strictly speaking, is there anything about ES5 (or some other gotcha) that stops us from separating |
.getRegex(); | ||
function Parser(options) { | ||
this.tokens = []; | ||
this.token = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is having it be null
necessary? Given this becomes an object with members might be easier for a new person to understand if they saw something like this (or make a Token object):
{
type: '',
...
}
Was going to make a more complete example; however, it appears the Token is just a random set of members...like an undefined type. You can see it in the switch
inside Parser.prototype.tok()
~ line 930
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something more akin to this from traditional OO languages...using Swift highlighting, but not the syntax, strictly speaking.
abstract class Token {}
class Heading extends Token {}
// then the switch could be something like
switch (typeof this.token.type) {
case 'Heading':
...
But, if the type
member is the only thing in common between all the different variations for Token...then there's really no reason for the type
member or Token. ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are more "hidden" objects inside of here.
} else if (this.options.pedantic) { | ||
this.rules = inline.pedantic; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Renderer is just below here, it's essentially a string concatenation object...there's a separate one for both inline and block elements...think it could actually be much smaller.
When we get there, I'll probably want to work on this one...leverage the work from here and here. More the former than the latter since we need to accommodate client-side and most likely we are the extension of someone else's package, being used in, yet again, someone else's project.
We really should try to make Marked compile to something tiny - that's probably why some of the "clever" decisions were made, in fact. Not sure if the project started prior to the prevalence of minify for JS.
[edit] There's also the inability to minify the "boilerplate" mentioned by @UziTech in #746 - "prototype" can't get minified...and neither can any of the JS standard library calls and JS base language stuff.
It seems like what Marked is trying to do is sound, but it never got the chance to iterate that far...could be totally wrong here, but looking at the code, it's starting to feel familiar based on other projects: M⬇ --- parse ---> {} --- render ---> HTML As we consider the future refactoring and unit tests, might be worth keeping the following HTML element type definition in mind: {
// h1, h2, etc.
elem: string,
// the attribute list, which is usually a set of key-value pairs, JSON, in JS,
// where you can easily convert keys to strings to generate the desired `key=value`
// string output.
attr: array,
// this gets tricky because some HTML elements can only take a string; or, there's the
// child element problem (what marked seems to use "inline" for). If we create a text
// object that only holds a string we can always pass in an array of element objects
// (what we seem to call tokens at the moment).
inner: string|array,
// this phrasing comes from the HTML5 spec when looking at the definition of an
// individual element, the first time I created this type definition I called
// `isSelfClosing` but that proved confusing.
omitEndTag: boolean
} This is almost the model used by React, I think, but not explicitly just through their Maybe we need an Element object?? Then the parser can just keep creating those. Various Markdown flavors are also only able to use so many of the global attributes afforded by HTML5 that we might be able to do some optimization there as well. |
Closing as edification was achieved, conflicts, and will most likely happen over a series of PRs not just one. |
Marked version: 0.3.17
Description
Experimenting with the codebase to learn more and try to make marked easier to read without fundamentally breaking things.
Again, this is experimental, please do not merge!
Having said that, this should demonstrate how we can effectively alter how easy it is to read marked without actually changing the functionality. Please see code comments for questions I have regarding why certain decisions were made and whatnot.
I've also noted areas where I do not know if it is possible to test a unit of functionality with our current testing harness; therefore, tagging @UziTech.