Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/token: add IsIdentifier, IsKeyword, and IsExported funcs #30064

Closed
mvdan opened this issue Feb 2, 2019 · 13 comments
Closed

go/token: add IsIdentifier, IsKeyword, and IsExported funcs #30064

mvdan opened this issue Feb 2, 2019 · 13 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Feb 2, 2019

We already have go/ast.IsExported, which is helpful; it makes the operation simple and expressive, and it avoids the simple mistake of only checking ASCII and ignoring Unicode.

Checking if a string is a valid identifier as per the spec (https://golang.org/ref/spec#Identifiers) falls under the same category, I believe. It seems intuitive, so developers tend to write a simple implementation by hand without much effort. However, there are many gotchas which are easy to get wrong:

  • Unicode
  • Digits are allowed, but not at the beginning
  • _ is allowed, even as a first character
  • Empty strings are not valid identifiers

For example, take this implementation I just came across this morning:

for _, c := range str {
	switch {
	case c >= '0' && c <= '9':
	case c >= 'a' && c <= 'z':
	case c >= 'A' && c <= 'Z':
	case c == '_':
	default:
		return false
	}
}
return true

It gets three of the gotchas wrong; it would return true on "" and "123", and return false on "αβ". I imagine that a correct implementation would be like:

if len(str) == 0 {
	return false
}
for i, c := range str {
	if unicode.IsLetter(c) || c == '_' || (i > 0 && unicode.IsDigit(c)) {
		continue
	}
	return false
}
return true

However, I'm still not 100% confident that this is correct. I'd probably compare it to implementations in go/* and x/tools before using the code, just in case.

The standard library implements variants of this in multiple places; go/scanner does it on a per-rune basis while scanning, go/build implements it in its importReader, and cmd/doc has an isIdentifier func. The same applies to other official repos like x/tools, where I've found godoc.isIdentifier, semver.isIdentChar, analysis.validIdent, and rename.isValidIdentifier.

I think we should have a canonical implementation with a proper godoc and tests. My proposal is to add this to go/ast, as it would already be useful to other packages within the standard library. I also think this is a fairly low-level API, and it's commonly needed whenever one is modifying Go syntax trees, creating new Go files, or otherwise tinkering with Go syntax.

If adding API to go/ast is a problem, perhaps golang.org/x/tools/go/ast/astutil. However, in my mind that package contains much higher-level functions, so it feels a bit out of place.

I realise this could have been a proposal, but I thought the proposed change wouldn't be controversial enough to warrant one. Feel free to retitle it as a proposal if preferred.

/cc @griesemer @josharian @alandonovan @dominikh

@mvdan mvdan added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Feb 2, 2019
@adonovan
Copy link
Member

adonovan commented Feb 2, 2019

I think this is a fine idea, but the function belongs in go/token, not go/ast.

@mvdan
Copy link
Member Author

mvdan commented Feb 2, 2019

go/token sounds fine; I just suggested go/ast as that's where IsExported is.

@ianthehat pointed out keywords, which I was forgetting about. The spec mentions:

The following keywords are reserved and may not be used as identifiers.

So I presume that IsIdentifier should return false for all Go keywords.

Ian also mentions that go/scanner could be used for this by the user, checking that the entire string scans as a single identifier. I've implemented what I think is a correct way to do it below:

func isIdentifier(str string) bool {
        src := []byte(str)
        var s scanner.Scanner
        fset := token.NewFileSet()
        file := fset.AddFile("", fset.Base(), len(src))
        s.Init(file, src, nil, scanner.ScanComments)
        _, tok, lit := s.Scan()
        return tok == token.IDENT && lit == str
}

However, I still think that an IsIdentifier func in the standard library would be better. Compared to using go/scanner directly, it would be a single line and require zero allocations.

@josharian
Copy link
Contributor

We might consider IsKeyword in go/token as well. In addition to being convenient, the implementation could use a perfect hash for faster checks (as does cmd/compile/internal/syntax).

@mvdan mvdan changed the title go/ast: add func IsIdentifier(string) bool go/token: add func IsIdentifier(string) bool Feb 2, 2019
@adonovan
Copy link
Member

adonovan commented Feb 2, 2019

I'd be happy for us to add IsIdentifier, IsKeyword, and IsExported to go/token.

@mvdan
Copy link
Member Author

mvdan commented Feb 3, 2019

IsKeyword sounds like a great idea; I imagine that IsIdentifier would still return false for inputs like for, though. That is, IsIdentifier would use IsKeyword as part of its implementation.

and IsExported to go/token

Not sure if this is strictly necessary, since it's already part of go/ast. I assume you're suggesting it for the sake of consistency.

@adonovan
Copy link
Member

adonovan commented Feb 3, 2019

I assume you're suggesting it for the sake of consistency.

Exactly. The go/ast function would delegate to the go/token implementation.

@griesemer
Copy link
Contributor

I'm ok with @adonovan said.

@griesemer griesemer added this to the Go1.13 milestone Feb 5, 2019
@griesemer griesemer added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 5, 2019
@griesemer
Copy link
Contributor

Please send CLs to me. Thanks.

@mvdan mvdan self-assigned this Feb 5, 2019
@mvdan mvdan changed the title go/token: add func IsIdentifier(string) bool go/token: add IsIdentifier, IsKeyword, and IsExported funcs Feb 5, 2019
@Skarlso
Copy link
Contributor

Skarlso commented Feb 19, 2019

Hi @mvdan @adonovan @griesemer. Can I work on this one? I don't see a CL attached.

@josharian
Copy link
Contributor

I’m pretty sure @mvdan is working on this; he assigned it to himself.

@Skarlso
Copy link
Contributor

Skarlso commented Feb 19, 2019

@josharian Cool, thanks. :)

@mvdan
Copy link
Member Author

mvdan commented Feb 28, 2019

I will get to it soon - I just have a moderately sized backlog of stuff to go through, now that the tree is open again :)

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/169018 mentions this issue: go/token: add IsIdentifier, IsKeyword, and IsExported

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants