-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Support CITEXT
type alias for case-insensitive text
#22463
Comments
Good idea. Note we'll have to make them interact with collations properly as well, which seems slightly confusing:
|
Yeah. That doesn't seem too bad, though. Just attach |
This might be a good starter project. |
I need to start work on this issue , please any one give me hints how to get started. |
@knz @jordanlewis |
👋 @bdarnell, I was thinking about tackling this issue... but I'm also new to the codebase. I'm assuming you want to avoid a hacky approach, like looking for the string From reading the tech notes, it seems like I'd have to modify the I don't feel like I quite have it right, but I wanted to ask before getting into a rabbit hole |
Ben is on parental leave and I can help instead. The technical approach you have identified is sound and seems like a good direction to explore. |
So I've looked at implementing this feature for a bit, and had a few questions on intended functionality. If we naively alias (this is an example logictest)
This is something I have implemented currently, and I can get that logic test passing. However, it's not quite how it works in Postgresql, which would be like so:
This seems a bit more tricky to implement. If anyone has any thoughts on which way to go, please let me know! |
Hi @shaqq; first of all, many thanks for your preliminary investigation and clarification of the problem. First of all a clarification: how did you implement the aliasing? It sounds like you made With your current implementation a CITEXT column would get the same OID as Now you are suggesting that CITEXT in PostgreSQL has no object ID. This seems surprising - there is always some ID reported for any data type. What happens if you do It may be that pg allocates something else than the oid for I agree that the rest of the investigation here needs to create certainty on this topic. The OID of the column's type, if not It's also important to check whether the OID is stable across postgres installations. The Finally, I would like to understand what's up with the last query. How come your implementation supports |
Right exactly - PG returns returns an OID for CITEXT, but it's presumably the behavior for any For example, on my PG instance (9.6.16), it has an OID of 51743 in my That suggests that what you're saying here is true:
I can look into creating a static, custom OID for a As for my previous implementation, it was pretty straightforward - I added syntax in However, as we're discovering, this sort of naive aliasing isn't the right way to go. I can look into creating a custom type just for CRDB with a custom OID. Maybe I get something working in a branch and then we go from there? |
We have a way in the code to assign a different (specific) OID to a type alias. See for example the difference between the objects When it comes to allocating a fixed OID, I am generally favorable as long as there's a range of values known to be "reserved for extensions" in postgresql. Also, if/when you add a specific OID for CITEXT, this will need to be reported in the various In any case this discussion made me realize this is not exactly "easy" or a "good first issue". We did not know this before you started investigating, but our apologies for that nonetheless. We'll appreciate if you want to continue investigate further, but we'd also understand if you felt overwhelmed and wanted to call it a day. |
Ha no worries! It's a good way for me to get acquainted with the CRDB codebase. Let me take another whack at it and see what I come up with. After that, if it's really far from where we want it to be and someone else wants to take over, I can move onto something else. Thanks for the tips! |
Hm, I've spun my wheels on this for a bit and not sure where to go. I can't quite find where in the codebase we do comparisons. For instance, for the following query:
Where does the comparison of In any case, I'm thinking about moving onto something else, so if someone else wants to take on this feature, feel free! |
Hey @shaqq, the problem you're running into is that, in your query, we don't actually have to perform comparison because the Try your query again without the PRIMARY KEY definition - that should hit the Compare function. |
This would be a very useful feature as long as the collation derivation would be implicit like in Postgres and not explicit as it is currently.
|
I just want to add a note here that "just adding an alias to a different COLLATE" does not seem to fix all the issues. I am converting a codebase from MySQL to CockroachDB, and when I tried to use Are there any other easy fixes for case-insensitive indexes? |
This is also needed for Elixir/phoenix/ecto/postgrex. Eg
|
While this is true, it is generating a migration for Postgres (I'm not sure if it creates a different migration depending on the ecto adapter provided). You are free to edit the migration before running it to match your used database. |
We have marked this issue as stale because it has been inactive for |
I would still very much like CITEXT. I have been using indexes with |
CITEXT
type aliasCITEXT
type alias for case-insensitive text
We have limited support for case-insensitive column types via a special and undocumented collation. Postgres has a CITEXT type for this purpose. Supporting
CITEXT
as an alias forTEXT COLLATE en_u_ks_level2
would make this much more discoverable and usable.gz#11468
Jira issue: CRDB-5855
The text was updated successfully, but these errors were encountered: