-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsearch: add ts_rank functionality #97697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, 2 of 2 files at r2, 25 of 25 files at r3, 14 of 14 files at r4, 9 of 9 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @jordanlewis, and @rafiss)
pkg/sql/sem/builtins/tsearch_builtins.go
line 338 at r5 (raw file):
ret := make([]float32, 4) if arr.Len() < len(ret) { return ret, pgerror.New(pgcode.ArraySubscript, "array of weight is too short")
do you want to give a hint that length 4 is expected?
pkg/util/tsearch/rank.go
line 24 at r5 (raw file):
// Bitmask for the normalization integer. These define different ranking // behaviors. They're defined in Postgres in tsrank.c.
I looked at that file, but it doesn't make things much clearer. Can you provide any high level comments about what these are?
pkg/util/tsearch/rank.go
line 38 at r5 (raw file):
) func cntLen(v TSVector) int {
nit: add a function comment
pkg/util/tsearch/rank.go
line 55 at r5 (raw file):
// tsvector lexeme weights D, C, B, and A. The method parameter is a bitmask // defining different ranking behaviors, defined in the rankBehavior type. See // the Postgres documentation for more details.
is there a link?
pkg/util/tsearch/rank.go
line 71 at r5 (raw file):
} if res < 0 { res = 1e-20
nit: what does this constant mean?
pkg/util/tsearch/rank.go
line 83 at r5 (raw file):
} } // RANK_NORM_EXTDIST not applicable
why not?
pkg/util/tsearch/rank.go
line 120 at r5 (raw file):
return leafNodes[i].term.lexeme < leafNodes[j].term.lexeme }) // Then distinct: (wouldn't it be nice if Go had generics?)
It does now... does that change anything here?
pkg/util/tsearch/rank.go
line 135 at r5 (raw file):
} func findRankMatches(
nit: add a function comment. What are the parameters?
pkg/util/tsearch/rank.go
line 158 at r5 (raw file):
} func rankOr(weights [4]float32, v TSVector, q TSQuery) float32 {
add a function comment
pkg/util/tsearch/rank.go
line 187 at r5 (raw file):
// wi - should be sorted desc, // don't sort for now, just choose maximum weight. This should be corrected // Oleg Bartunov
Does this make sense to you? I'm scratching my head...
pkg/util/tsearch/rank.go
line 196 at r5 (raw file):
} func rankAnd(weights [4]float32, v TSVector, q TSQuery) float32 {
add a function comment
pkg/sql/logictest/testdata/logic_test/tsvector
line 323 at r5 (raw file):
---- 'ari':6 'base':3,13 'concept':17 'data':12,21 'form':10 'introduc':24 'model':2 'n':5 'normal':9 'relat':7,14 'sublanguag':22 'univers':20 0.075990885 '2':3 'appli':15 'certain':4 'consist':22 'discuss':13 'infer':11 'logic':10 'model':27 'oper':5 'problem':18 'redund':20 'relat':7 'section':2 'user':25 0.06079271
How about testing the other overloads of ts_rank
?
7ffd880
to
1c11d5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @rafiss, and @rytaft)
pkg/sql/sem/builtins/tsearch_builtins.go
line 338 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
do you want to give a hint that length 4 is expected?
I'm copying the inadequate Postgres message here. I suppose we can do better, it probably won't break compatibility... Done
pkg/util/tsearch/rank.go
line 24 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I looked at that file, but it doesn't make things much clearer. Can you provide any high level comments about what these are?
Done.
pkg/util/tsearch/rank.go
line 38 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: add a function comment
Done.
pkg/util/tsearch/rank.go
line 55 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
is there a link?
Done.
pkg/util/tsearch/rank.go
line 71 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: what does this constant mean?
I have no idea, it's not documented in the Postgres source. I copied this verbatim, I really have no idea what it does. I'll clarify this fact in the code, I wish I had something better to say here.
pkg/util/tsearch/rank.go
line 83 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why not?
Done.
pkg/util/tsearch/rank.go
line 120 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It does now... does that change anything here?
We still don't have access to x/exp/slices at this Go version which implements slices.Compact, once that's added we can replace a whole bunch of copypasta that looks just like this throughout the codebase. I'll file an issue.
pkg/util/tsearch/rank.go
line 158 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
add a function comment
Done.
pkg/util/tsearch/rank.go
line 187 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Does this make sense to you? I'm scratching my head...
No, I unfortunately have no idea what this means. I figured I'd just include it in case it helps somebody, lol.
pkg/util/tsearch/rank.go
line 196 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
add a function comment
Done.
pkg/sql/logictest/testdata/logic_test/tsvector
line 323 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
How about testing the other overloads of
ts_rank
?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 42 of 42 files at r6, 31 of 31 files at r7, 16 of 16 files at r8, 9 of 10 files at r9, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @rafiss, and @rytaft)
pkg/util/tsearch/tsvector.go
line 339 at r9 (raw file):
// routines like to_tsvector and to_tsquery. // It can return false in the second parameter to indicate a stopword was found. func TSLexize(config string, token string) (lexeme string, notAStopWord bool, err error) {
[super nit] The negative in notAStopWord
feels confusing, why not just stopWord
and return false by default?
pkg/util/tsearch/tsvector.go
line 376 at r9 (raw file):
pos := i + 1 if i > maxTSVectorPosition { pos = maxTSVectorPosition
[nit] Could you add a comment
// Postgres silently truncates positions larger than 16383 to 16383.
to mirror tsVectorLexer.lex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 42 of 42 files at r6, 31 of 31 files at r7, 16 of 16 files at r8, 5 of 10 files at r9, 5 of 5 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @rafiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @rafiss, and @rytaft)
pkg/util/tsearch/tsvector.go
line 339 at r9 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[super nit] The negative in
notAStopWord
feels confusing, why not juststopWord
and return false by default?
Done.
pkg/util/tsearch/tsvector.go
line 376 at r9 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Could you add a comment
// Postgres silently truncates positions larger than 16383 to 16383.
to mirror
tsVectorLexer.lex
?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 38 of 38 files at r11, 33 of 33 files at r12, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball and @rafiss)
This commit adds ts_rank, the family of builtins that allow ranking of text search results. The function takes a tsquery and a tsvector and returns a float that indicates how good the match is. The function can be modified by passing in a custom array of weights that matches the text search weights A, B, C, and D, and a bitmask that controls the ranking behavior in various detailed ways. See the excellent Postgres documentation here for details: https://www.postgresql.org/docs/current/textsearch-controls.html Release note (sql change): add the ts_rank function for ranking text search query results
TFTRs! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from f1ce8cf to blathers/backport-release-23.1-97697: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Inadvertent leftover from cockroachdb#97697. Release note: None
99089: backup: use locality filter to restrict processes to locality r=dt a=dt This extends the previous locality filter option that pinned the backup job to a specific locality filter to also pin the processors which backup the actual row data as well, allowing assignment of a backup job to only the instances running in a particular locality. Those instances will attempt to read from the closest replica they can, however a potential optimization would be to pick an instace closest to a replica that also matches the locality filter if there is one -- currently it will likely just pick the instance closest to the leaseholder which matches the filter, however there may be an instance closer to a usable replica that is not the leaseholder. To make use of these replicas, a replica oracle that ranks replicas matching the filter would need to be passed when locality filtering placement is in-use; this optimization is left to a follow-up. Release note (sql change): the nodes involved in the execution of backups, including processing of row data and the job coordination, can now be controlled using an execution-locality filter. Epic: CRDB-9547. 99324: tsearch: remove commented out code r=jordanlewis a=jordanlewis Inadvertent leftover from #97697. Epic: None Release note: None Co-authored-by: David Taylor <[email protected]> Co-authored-by: Jordan Lewis <[email protected]>
Updates: #41288
Epic: CRDB-22357
All but the last commit are from #92966, #97677, and #97685