-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(search): STOPWORDS #3851
feat(search): STOPWORDS #3851
Conversation
b5a7db7
to
ada95d0
Compare
Signed-off-by: Vladislav Oleshko <[email protected]>
ada95d0
to
468b98f
Compare
src/core/search/search.h
Outdated
const Schema* schema_; | ||
const IndicesOptions* options_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same shared_ptr vs ptr question here. Everything points esentially to shared_ptr that is created in the coordinator when parsing the index definition. Lifetimes are ensured by ShardDocIndex that create the FieldIndices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you feel that shared_ptr
is more correct because of lack of single owner, then lets switch to shared_ptr
bool was_built_ = false; | ||
std::shared_ptr<const DocIndex> base_; | ||
search::FieldIndices indices_; | ||
std::optional<search::FieldIndices> indices_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More elegant, creating a dummy instance with nullptrs was just an ugly hack
private: | ||
const StopWords* stopwords_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not sure if to make it a shared_ptr (can be easily created from IndexOptions from an aliasing constructor)
- Const-ptr to indicate longevity... or use a reference instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer a const ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you ask then? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I forgot to make this a constref 😨 Never mind
src/core/search/search.cc
Outdated
IndicesOptions::IndicesOptions() { | ||
static absl::flat_hash_set<std::string> kDefaultStopwords{ | ||
"a", "is", "the", "an", "and", "are", "as", "at", "be", "but", "by", | ||
"for", "if", "in", "into", "it", "no", "not", "of", "on", "or", "such", | ||
"that", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"}; | ||
|
||
stopwords_ = kDefaultStopwords; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be part of the core module or the server module 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine here.
Remebered: Need to add and test persistence 🤓 |
src/core/search/search.cc
Outdated
stopwords_ = kDefaultStopwords; | ||
} | ||
|
||
FieldIndices::FieldIndices(const Schema* schema, const IndicesOptions* options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the ownership stays with the caller while this object lives, it's better using const refs
dd4344e
to
5f95cf6
Compare
Support for FT.STOPWORDS