Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document special collations #1471

Closed
bdarnell opened this issue May 26, 2017 · 8 comments · Fixed by #8523
Closed

Document special collations #1471

bdarnell opened this issue May 26, 2017 · 8 comments · Fixed by #8523
Assignees
Labels
O-external Origin: Issue comes from external users. P-2 Normal priority; secondary task T-missing-info

Comments

@bdarnell
Copy link
Contributor

Our collation docs link to a list of languages that we support, but there's no mention of other modifiers. One useful modified locale is en_u_ks_level2, which is case-insensitive english (I think the _u_ks_level2 modifier can be added to other languages too to use their case-sensitivity rules). I'm not sure if there are other useful modifiers that are supported by our collation package. These modifiers are standardized, but the references I've found don't seem to agree with each other. Unicode TR 35 looks like the authority on this, although it doesn't define the level2 syntax. collate/option.go is where these modifiers get parsed.

@jseldess
Copy link
Contributor

@sploiselle, since you documented collations originally, I suspect this would be a very quick update for you. Feel free to unassign if you feel it's more involved and needs to wait until later.

@sploiselle sploiselle added this to the 1.1 milestone Aug 30, 2017
@jseldess
Copy link
Contributor

Helpful supplementary thread on the forum: https://forum.cockroachlabs.com/t/case-insensitive-collations/926/4

@jseldess
Copy link
Contributor

From @justinj, copied over from #2430:

I actually have no idea how this stuff works, but there's some magical collations that just seem to exist, like en_u_ks_level1, which is English text ignoring case. I can't find anything in our docs on this sub-language of how to specify additional rules for collations, but it seems WAY too useful to not have documented on the collations page (even just for the special case of only case-insensitivity).

This page seems to have some info on it.

cc @mjibson
cc @knz because it seems you reviewed some of the PRs for collations.

@jseldess
Copy link
Contributor

This goes back to 1.1, but moving it to the 2.1 milestone.

@jseldess jseldess removed this from the 2.1 milestone Oct 30, 2018
@jseldess jseldess added the O-external Origin: Issue comes from external users. label Nov 12, 2018
@rmloveland rmloveland added the P-2 Normal priority; secondary task label Jan 17, 2019
@bdarnell
Copy link
Contributor Author

bdarnell commented Oct 3, 2019

PostgreSQL 12 just added support for these collations, which they call "non-deterministic collations": https://www.postgresql.org/docs/12/collation.html

Their syntax for this is different from ours and they just point out to the unicode docs for more detail. We should be sure to include the phrase "non-deterministic collations" when we document this for SEO purposes.

@ericharmeling ericharmeling added P-1 High priority; must be done this release and removed P-1 High priority; must be done this release labels Sep 30, 2020
@ericharmeling ericharmeling self-assigned this Sep 30, 2020
@ericharmeling
Copy link
Contributor

@bdarnell

Their syntax for this is different from ours and they just point out to the unicode docs for more detail. We should be sure to include the phrase "non-deterministic collations" when we document this for SEO purposes.

Do we support both deterministic and non-deterministic collations?

@bdarnell
Copy link
Contributor Author

bdarnell commented Oct 1, 2020

Yes, we have both deterministic and non-deterministic collations. But looking back at this, I think these terms are fairly obscure - users don't need to care whether a collation is deterministic or not. The only real reason to use these terms is for a user going from "postgres 12 introduces support for non-deterministic collations" to searching for whether cockroachdb supports non-deterministic collations.

@ericharmeling
Copy link
Contributor

The PG docs talk about nondeterministic collations in the context of CREATE COLLATION (deterministic=false), which I don't believe we support.

So I'm still a little confused about how best to document this. For now, I can go ahead and just add a generic note that we support both deterministic and nondeterministic collations. (see #8523)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-external Origin: Issue comes from external users. P-2 Normal priority; secondary task T-missing-info
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants