-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement Cassandra/ScyllaDB Go Online Store #138
Conversation
ed81825
to
95e0544
Compare
# Conflicts: # go.mod # go.sum # sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/cassandra_online_store.py
// parse protocolVersion | ||
protocolVersion, ok := onlineStoreConfig["protocol_version"] | ||
if !ok { | ||
protocolVersion = 4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing but I remember facing some connection issues with 3.0 or without. Have you come across any issues when using any other Protocol version? Might be better to keep protocolVersion to 4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't come across such issues
) | ||
} | ||
|
||
func TestOnlineRead_RejectsDifferentFeatureViewsInSameRead(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for OnlineRead with valid inputs?
LGTM, added some comments to address. |
} | ||
} else { | ||
// Return FeatureData with a null value | ||
rowFeatures[featureName] = FeatureData{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this piece of code is repeating, would be nicer to call a helper function instead, something like :
func buildNullFeatureData(featureName, featureViewName string) FeatureData {
return FeatureData{
Reference: serving.FeatureReferenceV2{
FeatureViewName: featureViewName,
FeatureName: featureName,
},
Value: types.Value{
Val: &types.Value_NullVal{NullVal: types.Null_NULL},
},
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this else block entirely, if we don't put this FeatureData
in the rowFeatures
the final loop of the goroutine will fill the slot with Null
} | ||
|
||
if err := scanner.Err(); err != nil { | ||
errorsChannel <- errors.New("failed to scan features: " + err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify the error message to have more context when you look at it?
fmt.Errorf("failed to scan features for entity key '%s' ......)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this part we only have access to a serialized entity key. It's going to look something like this: 0200000064656d6f5f6b657904000000080000000000000000000000
, I don't think it's useful to include it as the customer won't know what it is (they input a normal key, for the sample above it was 0
).
return results, nil | ||
} | ||
|
||
func (c *CassandraOnlineStore) Destruct() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this Destruct() do? Maybe we should close the session here if my understanding is correct?
I see you are opening a session with "CreateTracedSession"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was following the lead of redisonlinestore.go
with empty Destruct()
. TBH I guess it won't matter a whole lot, as when Destruct()
needs to be called, the container is getting killed anyway, maybe the db will have a dead connection for a while.
Anyway, closing the session there now, thanks.
// parse hosts | ||
cassandraHosts, ok := onlineStoreConfig["hosts"] | ||
if !ok { | ||
cassandraConfig.hosts = []string{"127.0.0.1"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use constants wherever possible?
Example:
const DefaultCassandraHost = "127.0.0.1"
Wherever you're using hostnames, default values, etc.. Makes it easier to modify later instead of changeing it throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather not, these values are only used in one place (extractCassandraConfig(...)
) I think constants in this case make it harder to read without an IDE as you have to jump around.
} | ||
cassandraConfig.hosts = cassandraHostsStr | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can condense lines 70 to 90 or wherever you are parsing a string field for the Cassandra config? Something like the following? We could reduce the lines of code. We can even use godoc conventions to explain certain functions if it seems complicated.
func parseStringField(config map[string]any, key, defaultValue string) (string, error) {
rawValue, ok := config[key]
if !ok {
log.Warn().Msgf("%s not defined: Using default value '%s'", key, defaultValue)
return defaultValue, nil
}
value, ok := rawValue.(string)
if !ok {
return "", fmt.Errorf("failed to convert %s to string: %v", key, rawValue)
}
return value, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Cassandra/ScyllaDB
What this PR does / why we need it:
This PR is to implement the ScyllaDB online store which may incur in lower costs when compared to Redis for certain use cases. More details at https://confluence.expedia.biz/display/~msistla/Scylla+DB+Online+Store+Implementation
Change Summary:
Highlights
Concurrent Queries for Single
OnlineRead(...)
callsWhen a single
OnlineRead(...)
requests multiple keys, this online store will generate one CQL query per key like this:and run them in parallel; this is to avoid the potential performance cost of having
WHERE "entity_key" in ('key1', 'key2')
.DataDog Tracing
By leveraging https://github.com/DataDog/dd-trace-go/blob/main/contrib/gocql/gocql/gocql.go, we're able to have a detailed trace that tells us just how much time we're spending retrieving the features and how much overhead the feature server is actually adding. Below is an image of a trace for retrieval for 3 different keys.
Testing
Did a test deployment of this feature server in test us-west-2 for testing retrieval and tracing.
Known Limitations/TODOs
IN
is used for limiting the number of features retrieved, we may hit a query limit when fetching too many features in a single query. We may need to partition queries to retrieve less features at a time if necessary.