Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gosh: refactor into a multi-purpose tool #330

Closed
mvdan opened this issue Nov 30, 2018 · 7 comments
Closed

cmd/gosh: refactor into a multi-purpose tool #330

mvdan opened this issue Nov 30, 2018 · 7 comments

Comments

@mvdan
Copy link
Owner

mvdan commented Nov 30, 2018

shfmt has grown in popularity, since it's a useful tool with a unique name. However, its purpose is also narrow - it simply formats shell programs.

This is a pity, because this repository holds plenty of features beyond parsing and printing shell code. I propose that gosh be turned into a shell "swiss army knife", with features such as:

There would still be a place for shfmt, of course. It's well established, much simpler and smaller, and essentially a different tool. If I could go back in time, perhaps I'd have published it as just gosh format, but that's out of the question now.

The current gosh shell could be renamed to gosh-shell, since it's mostly meant as a debugging and experimental tool. We can't have that be only gosh shell, as the only way to configure the system shell is to use a path to a binary.

@mvdan mvdan added this to the 3.0 milestone Nov 30, 2018
@mvdan
Copy link
Owner Author

mvdan commented Nov 30, 2018

Since we're working on v3, I'd likely also remove shfmt -f and shfmt -tojson in favor of their new counterparts. They're rarely useful as part of shfmt, and if anyone needs that advanced usage, they can always just install gosh.

@rob-myers
Copy link

rob-myers commented Dec 1, 2018

Since we're working on v3, I'd likely also remove shfmt -f and shfmt -tojson in favor of their new counterparts.

Concerning gosh parse, I intend to contribute typings for the javascript package e.g.

  /**
   * ArrayElem represents a Bash array element.
   */
  type ArrayElemGeneric<Base, Pos, Op> = Base & {
    type: 'ArrayElem';
    /** [i]=, ["k"]= */
    Index: null | ArithmExprGeneric<Base, Pos, Op>;
    Value: WordGeneric<Base, Pos, Op>;
    Comments: CommentGeneric<Base, Pos, Op>[];
  }

They should agree with the output of gosh parse, so I'll describe my approach.

  1. All field names are preserved.
  2. Getters are recursively replaced by their value.
  3. Undefined/unset fields are added with null value e.g. ArrayElem.Index.
  4. Each node has a type field with value e.g. 'ArrayElem'.
  5. Op is left as a number e.g. 6 rather than { name: 'sglQuote', value: '\'' }.
  6. Malformed positions are preserved e.g. CallExpr.Semicolon can be { Line: 0, Offset: 0, Col: 0 }.

@mvdan
Copy link
Owner Author

mvdan commented Dec 1, 2018

All field names are preserved.

The only clarification I'd make here is that I'd squash anonymous fields. For example, File has an anonymous StmtList field, meaning that one can do file.StmtList.Stmts[0], but most users would do file.Stmts[0]. We can't support both in JS/JSON, so I'd likely just go with the shorter and less deep fields.

Getters are recursively replaced by their value

We can't have funcs in JSON, so I agree. I'm not sure about the JS package though. Would be nice if the methods reflected any changes made to the fields. But methods might be too complex to do if we do the translation from Go to generic JS/JSON objects ourselves.

Undefined/unset fields are added with null value

Agreed, as far as nil-able fields go. That's all pointers and interfaces.

Each node has a type field with value

Agreed. I went with Type in shfmt -tojson for consistency with the other fields, but it might be better to differentiate which fields are not from the syntax tree definitions.

Op is left as a number

I was thinking of leaving operators as strings, actually. In Go, they're numbers simply because that's the simplest way to describe enums. Operators are essentially strings, so the string would need to be in the JSON somewhere.

Malformed positions are preserved

If you mean missing positions like a statement's semicolon position, sure. I left them out of shfmt -tojson simply to avoid verbosity. But with gosh parse, we could always add a "short" flag to make the output less verbose for humans.

@rob-myers
Copy link

The only clarification I'd make here is that I'd squash anonymous fields.

Sure. I do too when I translate your trees into my interpreter's format.

We can't have funcs in JSON, so I agree. I'm not sure about the JS package though.

Forgot to mention I leave them as an optional property i.e.

  • FollowedByElif?(): boolean;
  • ExpandBraces?(): WordGeneric<Base, Pos, Op>[];.

I went with Type in shfmt -tojson for consistency with the other fields.

I am happy to use Type instead of type.

I was thinking of leaving operators as strings, actually.

I got it wrong. I translate them to value || name so that 6 becomes '\'' whereas 0 becomes 'illegalTok'. Presumably 0-5 are internal tokens.

If you mean missing positions like a statement's semicolon position, sure. I left them out of shfmt -tojson simply to avoid verbosity.

I'd prefer them to have a null value. I didn't do this because I wanted to make as few changes as possible. Generally speaking I prefer to see which fields don't contribute, and their value should be null.

@mvdan
Copy link
Owner Author

mvdan commented Dec 1, 2018

Forgot to mention I leave them as an optional property

Are those methods/functions? If so, are they implemented in terms of the methods on the JS objects that the current JS package gives you?

I am happy to use Type instead of type

I honestly don't mind. Using type sounds fine.

Someone on Hacker News suggested a while ago that I should replace all of my CamelCase names with snake_case in the JS package, to conform better to JS. I personally think that keeping the original names is better, since then one can look at the Go documentation almost directly.

Presumably 0-5 are internal tokens.

I think there's an important distinction to make here. There are lots of lexer tokens, some of which are then surfaced as operators in the syntax tree. << and + are both tokens and operators, but ' and _Lit (a literal string) are only tokens. These names and numbers should never surface to the final user.

The only reason why the operator enum values are a subset of the token enums is for simplicity; conversion back and forth is just a type conversion.

So the JS package should only expose operators, not tokens. If we encounter any invalid value like 0, we could use an empty string. The Go package doesn't expose an "invalid token" name or string, so I don't know what else we could use. An empty string can never be a token or operator, so that seems fair.

I'd prefer them to have a null value.

You mean always, or just in the "short" mode?

@rob-myers
Copy link

Are those methods/functions? If so, are they implemented in terms of the methods on the JS objects that the current JS package gives you?

Yes, they refer to the original functions. The typings I'm talking about don't exist at runtime. They are intended to describe what is in your javascript package, and are very helpful during development. They differ slightly because I mutate your data structures slightly: adding type and enforcing null values. They are parametric in Pos, which in your js package has type:

  interface Pos {
    type: 'Pos';
    /**
     * After reports whether this position p is after p2. It is a more expressive version of p.Offset() > p2.Offset().
     */
    After(p2: Pos): boolean;
    /**
     * Col returns the column number of the position, starting at 1. It counts in bytes.
     */
    Col(): number;
    /**
     * IsValid reports whether the position is valid. All positions in nodes returned by Parse are valid.
     */
    IsValid(): boolean;
    /**
     * Line returns the line number of the position, starting at 1.
     */
    Line(): number;
    /**
     * Offset returns the byte offset of the position in the original source file. Byte offsets start at 0.
     */
    Offset(): number;
    String(): string; 
  }

They are also parametric in Op (a number is your js package, but not in the API). I reuse the generic typings for the JSON format, changing Pos and Op.

I personally think that keeping the original names is better, since then one can look at the Go documentation almost directly.

Agreed. The typings can't change the names anyway, they are merely a reflection.

So the JS package should only expose operators, not tokens. If we encounter any invalid value like 0, we could use an empty string. The Go package doesn't expose an "invalid token" name or string, so I don't know what else we could use. An empty string can never be a token or operator, so that seems fair.

This problem arises because I need to convert an instance of an integer-valued enum into a string. Inasmuch as this conversion is needed, I use a line-by-line translation of your enum into an array:

export class ParseShService {

  private readonly opMetas: {
    name: string;
    value: null | string;
  }[] = [
    { name: 'illegalTok', value: null },
    { name: '_EOF', value: null },
    { name: '_Newl', value: null },
    { name: '_Lit', value: null },
    { name: '_LitWord', value: null },
    { name: '_LitRedir', value: null },
    { name: 'sglQuote', value: '\'' },
    { name: 'dblQuote', value: '"' },
    { name: 'bckQuote', value: '`' },
    /** etc. */

I have to use optMetas[i].value || optMetas[i].name to ensure type string (rather than null | string), even though at runtime this never arises. Btw I use typescript with strict null checks e.g. type string does not admit value null.

Alternatively you could provide a utility function in your original package, exposing it in your js package. Even if GopherJS provided String() it would yield e.g. dblQuote, whereas I actually need the respective symbol ".

I'd prefer them to have a null value.

You mean always, or just in the "short" mode?

My preference would be to always explicitly include null values i.e. remove the ambiguity of being unset or having value undefined. Adding a short mode means additional typings, so perhaps I'm just being lazy.

@mvdan
Copy link
Owner Author

mvdan commented Jan 12, 2020

I've decided not to do this. Reasons:

  • gosh is an overused name already. It would be an uphill battle to try to "get" that unique name in package managers.
  • shfmt is already a well established name, packaged in dozens of systems. Do we need another package? Probably not.
  • Noone has shown particular interest in this idea, besides myself.

The ideas that I think are still worth doing, like -tojson and -fromjson, might be added to shfmt directly.

@mvdan mvdan closed this as completed Jan 12, 2020
@mvdan mvdan removed this from the 3.1.0 milestone Jan 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants