Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lakefs was crashing when using database.type=”dynamodb” and using #6924

Merged
merged 2 commits into from
Nov 2, 2023

Conversation

shvechkov
Copy link
Contributor

@shvechkov shvechkov commented Nov 2, 2023

using scylladb as a database (scylladb provides dynamodb API which can be enabled via alternator feature -
https://opensource.docs.scylladb.com/stable/alternator/alternator.html)

Apparently scylladb does not fully follow the spec and does not return ConsumedCapacity. Adding check for nil pointer to avoid crashes in lakefs.

Closes #(6923)
#6923

Change Description

simply checking for nil pointer to avoid crash

Background

Running scylladb locally with dynamodb api enabled via alternator feature (https://www.scylladb.com/alternator/)
Lakefs crashes in pkg/kv/dynamodb/store.go: 477 since it is getting nil pointer in queryResult.ConsumedCapacity

// dynamoConsumedCapacity.WithLabelValues(operation).Add(*queryResult.ConsumedCapacity.CapacityUnits)

database:
type: "dynamodb"
dynamodb:
table_name: "kvstore"
endpoint: "http://127.0.0.1:8777/"

Apparently scylladb did not fully follow the spec and returns nil in ConsumedCapacity but crashing is not a good option
Share context and relevant information for the PR: offline discussions, considerations, design decisions etc.

Bug Fix

If this PR is a bug fix, please let us know about:

  1. Problem - The reason for opening the bug
  2. Root cause - Discovered root cause after investigation
  3. Solution - How the bug was fixed

New Feature

If this PR introduces a new feature, describe it here.

Testing Details

How were the changes tested?
compiled lakefs from changed srcs and tested against scylldb with dynamodb API enabled (local scylladb cluster).
Did sanity test - all works as expected

Breaking Change?

Nope
Does this change break any existing functionality? (API, CLI, Clients)

Additional info

Logs, outputs, screenshots of changes if applicable (CLI / GUI changes)

Contact Details

[email protected]

How can we get in touch with you if we need more info? (ex. [email protected])

scylladb as a  database (scylladb provides dynamodb API  which can
be enabled via alternator feature -
https://opensource.docs.scylladb.com/stable/alternator/alternator.html)

Apparently scylladb does not fully follow spec and does not return ConsumedCapacity.
Adding check for nil pointer to avoid crashes in lakefs.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! This looks like a very useful fix and it is obviously useful.

I would prefer never to publish any values to the metric on ScyllaDB, rather than publish zeros, and made a suggestion,

I know of ScyllaDB but have never operated one in production. Can you expand on your use-case? If there are other issues operating with it, it might be better to write a full KV implementation using a more native ScyllaDB API than to use their DynamoDB translator.

pkg/kv/dynamodb/store.go Outdated Show resolved Hide resolved
@arielshaqed arielshaqed added area/integrations include-changelog PR description should be included in next release changelog area/KV Improvements to the KV store implementation labels Nov 2, 2023
yup, agree with all suggestions. Tested suggested change on my env locally (no issues found ) . Committing

Co-authored-by: Ariel Shaqed (Scolnicov) <[email protected]>
@shvechkov
Copy link
Contributor Author

@arielshaqed @nopcoder
Thank you for quick turnaround/suggestions. Regarding the use case: researching (educational project – nothing in production) on how to build highly available, scalable object store with immutability option (lakefs exposes s3 endpoint and provides immutability) I’m thinking about running bunch of nodes running lakefs→scylladb→ceph stack .. Lakefs is stateless, while scylladb and ceph(or some other s3 svc) are clustered and can be scaled horizontally. In this setup all lakefs instances will have the same unified vew. If some nodes go down, other nodes will still be able to handle requests (assuming that scylladb/s3 object store clusters are operational) ..
Do you see any potential pitfalls? (besides the issue of dynamo/scylla being eventually consistent … To address this I’m using alternator_write_isolation: only_rmw_uses_lwt)

@arielshaqed
Copy link
Contributor

arielshaqed commented Nov 2, 2023

Do you see any potential pitfalls? (besides the issue of dynamo/scylla being eventually consistent …

lakeFS makes some consistency demands of its object and kv stores.

  1. I know very little about Ceph, but I know it claims strong consistency, and it's probably strong enough there.
  2. I know even less about ScyllaDB, and obviously the consistency demands that kv makes on its kv are quite serious. I'm afraid I'll have to punt on this question. If you're planning to use this in production I should need to read exactly what consistency they do offer.
  3. You would be going with an untested combination. We do not test this combination, and in fact we test neither ScyllaDB nor Ceph. This currently is something of a niche configuration.

@arielshaqed
Copy link
Contributor

(To be clear: if everything is compatible enough, it will with and we won't break it.)

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@arielshaqed
Copy link
Contributor

Pulling... Esti can't run from a fork IIRC.

@arielshaqed arielshaqed merged commit f072fc2 into treeverse:master Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/integrations area/KV Improvements to the KV store implementation contributor include-changelog PR description should be included in next release changelog
Projects
None yet
5 participants