Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortran grammar for tree sitter #122

Closed
stadelmanma opened this issue Jan 4, 2018 · 9 comments
Closed

Fortran grammar for tree sitter #122

stadelmanma opened this issue Jan 4, 2018 · 9 comments

Comments

@stadelmanma
Copy link

Hi, I took notice of the pull request to use this type of parser for Atom syntax highlighting. Fortran is a language that would greatly benefit from this type of parsing for a lot of the same reasons as other lower level languages C, C++, etc. I am not experienced with parsing a language based on a CFG but I have been doing a lot of reading and digging into the existing code base to get a handle on things.

I have setup a repository to start working on the grammar, modeling it after the ones already under this group (at the time of this post only the basic dot files and the like are in place).

I was going to base it off the Waite/Cordy grammar provided in the Grammar Zoo since it seemed the easiest to work with. I noticed the C grammar was based off content in the same website so I thought it would be a good starting point. I'll cross reference this with the syntax highlight grammar defined in the language-fortran package to try and reduce the odds of missing anything from newer standards. The main thing I am not sure of how to handle would be the differences between Free Form and Fixed Form Fortran.

If you have any tips beyond the example in the README on how best to proceed with this process they would be greatly appreciated. Or if there is somewhere better to pull an existing grammar from I will gladly use as the starting point instead.

Cheers!
Matt

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jan 4, 2018

Hi! Great to hear that you're working on a Fortran grammar. I wrote a little Fortran 90 myself in college, though I've mostly forgotten the language by now.

I think Fortran 90 (free format) should be fairly easy to handle with Tree-sitter. Fixed-Format Fortran will probably be more difficult and require the use of an external scanner. External scanners are a feature that allow you to add custom C/C++ code to Tree-sitter's generated scanner. The python uses one in order to handle Python's indentation-sensitive syntax.

I'd suggest ignoring fixed-format Fortran for the time being, and concentrating on Fortran 90 for now. As you get started, feel free to ping me on any specific issues that you hit. There's definitely a learning curve to using Tree-sitter if you haven't used other LR-type parsing tools before, and unfortunately I haven't yet had the time to write good documentation; as you said, the other existing grammars are currently the best way to understand how to use the tool.

@stadelmanma
Copy link
Author

That sounds fair ignoring fixed form in the mean time will certainly help getting a minimally functional project out to the community quicker. Does the Parser support case insensitive regexes of the form /program/i? Since all of Fortran is case insensitive and we will otherwise get stuck with awkward stuff like /[Pp][Rr][Oo][Gr][Aa][Mm]/.

@maxbrunsfeld
Copy link
Contributor

Unfortunately, case-insensitive regexes aren't supported right now. We could totally support them. In the meantime, you could approximate them yourself with a helper function:

function caseInsensitive (keyword) {
  return new RegExp(keyword
    .split('')
    .map(letter => `[${letter}${letter.toUpperCase()}]`)
    .join('')
  )
}

which you could use like this:

program: $ => seq(
  caseInsensitive('program'),
  $.identifier,
  // ...
),

@stadelmanma
Copy link
Author

@maxbrunsfeld could you provide a little more guidiance on when to use the DSL helpers exported by tree-sitter-cli?

module.exports = {
  alias: alias, // I think I get it but tips on the intended usage here would be nicc 
  grammar: grammar, // no questions here
  blank: blank, // no questions here
  choice: choice, // no questions here
  err: err, // seems simple enough but not sure of the intended use case?
  optional: optional, // no questions here
  prec: prec, // I think I get this one but tips would be nice as well
  repeat: repeat, // this seems simple enough, repeat the given rule indefinitely 
  repeat1: repeat1, // why is there this version? is it just to repeat once as the name suggests? 
  seq: seq, // no questions here
  sym: sym, // not sure what this means
  token: token // not sure of the intended use case here either
};

@stadelmanma
Copy link
Author

ping @maxbrunsfeld, see above

@maxbrunsfeld
Copy link
Contributor

I finally started some official docs about creating parsers. There's a section that explains each public function here. There's still a lot that needs to be explained; this is just a start. Let me know what you think of these, and what you think needs to be added.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Mar 2, 2018

@stadelmanma I'm going to close this issue out. If you have additional questions, you could just @ mention me on issues/PRs in tree-sitter-fortran. It's very cool to see the progress you've made so far!

@stadelmanma
Copy link
Author

@maxbrunsfeld the documentation is already a great help thanks!

jrsconfitto added a commit to jrsconfitto/tree-sitter-powershell that referenced this issue Mar 20, 2018
dgarroDC added a commit to dgarroDC/tree-sitter-ldpl that referenced this issue Apr 18, 2019
@APerricone
Copy link

APerricone commented Apr 26, 2019

In my parser I use an improved version of function:

function toCaseInsensitive(a) {
  var ca = a.charCodeAt(0);
  if (ca>=97 && ca<=122) return `[${a}${a.toUpperCase()}]`;
  if (ca>=65 && ca<= 90) return `[${a.toLowerCase()}${a}]`;
  return a;
}

function caseInsensitive (keyword) {
  return new RegExp(keyword
    .split('')
    .map(toCaseInsensitive)
    .join('')
  )
}

so I can use it with groups, like:

    procedure_definition: $ => seq(
      caseInsensitive("proc(e(d(u(r(e)?)?)?)?)?"),
      $.identifier,
      $.parameter_list,
      $._endline,
      repeat($.local_list),
      repeat($._statementProc)
    ),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants