-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fortran grammar for tree sitter #122
Comments
Hi! Great to hear that you're working on a Fortran grammar. I wrote a little Fortran 90 myself in college, though I've mostly forgotten the language by now. I think Fortran 90 (free format) should be fairly easy to handle with Tree-sitter. Fixed-Format Fortran will probably be more difficult and require the use of an external scanner. External scanners are a feature that allow you to add custom C/C++ code to Tree-sitter's generated scanner. The python uses one in order to handle Python's indentation-sensitive syntax. I'd suggest ignoring fixed-format Fortran for the time being, and concentrating on Fortran 90 for now. As you get started, feel free to ping me on any specific issues that you hit. There's definitely a learning curve to using Tree-sitter if you haven't used other LR-type parsing tools before, and unfortunately I haven't yet had the time to write good documentation; as you said, the other existing grammars are currently the best way to understand how to use the tool. |
That sounds fair ignoring fixed form in the mean time will certainly help getting a minimally functional project out to the community quicker. Does the Parser support case insensitive regexes of the form |
Unfortunately, case-insensitive regexes aren't supported right now. We could totally support them. In the meantime, you could approximate them yourself with a helper function: function caseInsensitive (keyword) {
return new RegExp(keyword
.split('')
.map(letter => `[${letter}${letter.toUpperCase()}]`)
.join('')
)
} which you could use like this: program: $ => seq(
caseInsensitive('program'),
$.identifier,
// ...
), |
@maxbrunsfeld could you provide a little more guidiance on when to use the DSL helpers exported by tree-sitter-cli? module.exports = {
alias: alias, // I think I get it but tips on the intended usage here would be nicc
grammar: grammar, // no questions here
blank: blank, // no questions here
choice: choice, // no questions here
err: err, // seems simple enough but not sure of the intended use case?
optional: optional, // no questions here
prec: prec, // I think I get this one but tips would be nice as well
repeat: repeat, // this seems simple enough, repeat the given rule indefinitely
repeat1: repeat1, // why is there this version? is it just to repeat once as the name suggests?
seq: seq, // no questions here
sym: sym, // not sure what this means
token: token // not sure of the intended use case here either
}; |
ping @maxbrunsfeld, see above |
I finally started some official docs about creating parsers. There's a section that explains each public function here. There's still a lot that needs to be explained; this is just a start. Let me know what you think of these, and what you think needs to be added. |
@stadelmanma I'm going to close this issue out. If you have additional questions, you could just |
@maxbrunsfeld the documentation is already a great help thanks! |
Uses the suggestion from the following comment: tree-sitter/tree-sitter#122 (comment)
In my parser I use an improved version of function: function toCaseInsensitive(a) {
var ca = a.charCodeAt(0);
if (ca>=97 && ca<=122) return `[${a}${a.toUpperCase()}]`;
if (ca>=65 && ca<= 90) return `[${a.toLowerCase()}${a}]`;
return a;
}
function caseInsensitive (keyword) {
return new RegExp(keyword
.split('')
.map(toCaseInsensitive)
.join('')
)
} so I can use it with groups, like: procedure_definition: $ => seq(
caseInsensitive("proc(e(d(u(r(e)?)?)?)?)?"),
$.identifier,
$.parameter_list,
$._endline,
repeat($.local_list),
repeat($._statementProc)
), |
Hi, I took notice of the pull request to use this type of parser for Atom syntax highlighting. Fortran is a language that would greatly benefit from this type of parsing for a lot of the same reasons as other lower level languages C, C++, etc. I am not experienced with parsing a language based on a CFG but I have been doing a lot of reading and digging into the existing code base to get a handle on things.
I have setup a repository to start working on the grammar, modeling it after the ones already under this group (at the time of this post only the basic dot files and the like are in place).
I was going to base it off the Waite/Cordy grammar provided in the Grammar Zoo since it seemed the easiest to work with. I noticed the C grammar was based off content in the same website so I thought it would be a good starting point. I'll cross reference this with the syntax highlight grammar defined in the language-fortran package to try and reduce the odds of missing anything from newer standards. The main thing I am not sure of how to handle would be the differences between Free Form and Fixed Form Fortran.
If you have any tips beyond the example in the README on how best to proceed with this process they would be greatly appreciated. Or if there is somewhere better to pull an existing grammar from I will gladly use as the starting point instead.
Cheers!
Matt
The text was updated successfully, but these errors were encountered: