Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve index, rindex and indices against string to count the index by utf8 characters #1916

Closed
wants to merge 1 commit into from

Conversation

itchyny
Copy link
Contributor

@itchyny itchyny commented Jun 4, 2019

This is a fix for #1430 and #1624 to resolve confusing byte index.

@itchyny itchyny force-pushed the index-rindex-indices-utf8 branch from bed5f4a to 23301d9 Compare June 4, 2019 02:00
@coveralls
Copy link

coveralls commented Jun 4, 2019

Coverage Status

Coverage decreased (-0.4%) to 84.628% when pulling 23301d9 on itchyny:index-rindex-indices-utf8 into ad9fc9f on stedolan:master.

@itchyny itchyny closed this Jun 5, 2019
@itchyny itchyny deleted the index-rindex-indices-utf8 branch June 5, 2019 00:16
@nicowilliams
Copy link
Contributor

@itchyny why did you close this?

@itchyny
Copy link
Contributor Author

itchyny commented Jun 12, 2019

Because I think this will not be merged in the near future due to this comment #1624 (comment) (I still think it's worth changing the behavior though).

@nicowilliams
Copy link
Contributor

Ah, yes, of course. Thanks.

@emanuele6
Copy link
Member

This is a fix for #1430.

Do we want to merge this, or close that issue?

@nicowilliams
Copy link
Contributor

This is a fix for #1430.

Do we want to merge this, or close that issue?

Great question. String indexing is fraught; maybe we shouldn't have it at all! But if we'll have it we should definitely have:

  • index by nth glyph
  • index by nth codepoint

and maybe:

  • index by nth byte (because it's all UTF-8, and indexing by byte in Unicode-aware apps may make sense)

That we have indexing by nth byte is essentially a design bug born of not having enough Unicode functionality.

@emanuele6
Copy link
Member

@nicowilliams

I think we should change the behaviour of index to make .[index("a"):] work as expected, since string slice indexing is by nth unicode codepoint, otherwise index/rindex/indices are pretty much useless for strings. That is also how index functions work in gojq/fq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants