Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Iterator Key() and Value() no longer return a copy #168

Merged
merged 12 commits into from
Jul 15, 2024
Merged

Conversation

alesforz
Copy link
Contributor

@alesforz alesforz commented Jul 11, 2024

Context

Closes #156.

Changes

The:

  • Iterator.Key() API implementations of levelDB, badgerDB, and pebbleDB no longer return a copy of the key.
  • Iterator.Value() API implementations of levelDB and pebbleDB no longer return a copy of the value.

Rather, as the updated docs reflect, the caller is responsible for creating a copy of the slice if they want to modify it.

See my further comments about rocksdb and badgerdb below.

alesforz added 5 commits July 10, 2024 13:45
- returns the key itself, not a copy.
- updated docs to reflect the change
- returns the key itself, not a copy.
- updated docs to reflect the change
- returns the key itself, not a copy.
- updated docs to reflect the change.
To reflect the changes, the tests now copy the iterator's key for further use
@alesforz alesforz added the enhancement New feature or request label Jul 11, 2024
@alesforz alesforz self-assigned this Jul 11, 2024
@alesforz
Copy link
Contributor Author

alesforz commented Jul 11, 2024

Unfortunately, I don't think we can avoid the copy in rocksdb Key() and Value() APIs implementations.
Under the hood, the library we use allocates and returns a C array, which is subject to C's memory resource policy.
But Go's memory management can't keep track of any memory allocated in the C library.
Therefore, to ensure there are no memory leaks, we call Free() to deallocate the C array's allocated memory.

In short, my understanding is that we need the copy of the key/value in order to free its memory allocated in the C library, thus avoiding possible memory leaks.

- returns the value itself, not a copy
- updated tests to reflect the change
- updated docs to reflect the change
@alesforz
Copy link
Contributor Author

alesforz commented Jul 11, 2024

We can't avoid the copy in the Value() API of badgerDB either.
Namely, badgerDB exposes 2 APIs to retrieve a value:

func (item *Item) Value(fn func(val []byte) error) error
func (item *Item) ValueCopy(dst []byte) ([]byte, error)

Note:

  • Item.Value does not return the value. Instead, it wants a closure that works on it directly inside the library's boundaries. The Value method will call fn passing the value we want to retrieve as an argument []byte. To extract it we must do something like:
    var valueCopy []byte
    err = item.Value(func(val []byte) error {
        // Make a copy of the value.
        valueCopy = append([]byte{}, val...)
        return nil
    })
    That is, we have to manually copy the value.
  • Item.ValueCopy creates a copy for us and returns it.

In both cases, the library returns as a copy of the original value; as far as I can tell, it offers no possibility of accessing the original value directly.

@alesforz
Copy link
Contributor Author

alesforz commented Jul 11, 2024

Given the discussion above, addressing #168 makes the library behave inconsistently depending on the underlying database used. Namely, some databases require callers to make copies of key and value, while others do not. Consequently, users cannot switch between databases without first checking the behavior of their Key() and Value() APIs and adapting their code accordingly.

Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far 👍

badger_db.go Show resolved Hide resolved
@@ -279,19 +279,22 @@ func (i *badgerDBIterator) Valid() bool {
return true
}

// Key implements Iterator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Key implements Iterator.

I'd remove these comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I would leave them, because I think they give readers some additional context about the functions.
Granted, in the context of this repo it's clear what the functions refer to.
Still, I would argue it doesn't hurt to leave extra-info in.

@melekes
Copy link
Contributor

melekes commented Jul 12, 2024

Note: this is a breaking change.

That's debatable because, in the Iterator documentation we say read-only []byte, although we don't mention anything about the lifetime of the pointer.

@alesforz
Copy link
Contributor Author

Note: this is a breaking change.

That's debatable because, in the Iterator documentation we say read-only []byte, although we don't mention anything about the lifetime of the pointer.

I see your point.
The Iterator interface docs states:

// As with DB, keys and values should be considered read-only, and must be copied before they are
// modified.

therefore, users of the library know that they must create a copy, thus these changes shouldn't break anything.

@alesforz alesforz marked this pull request as ready for review July 12, 2024 11:03
@alesforz alesforz requested a review from a team as a code owner July 12, 2024 11:03
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@melekes melekes changed the title Iterator.Key() API no longer returns a copy of the key feat!: Iterator Key() and Value() no longer return a copy Jul 13, 2024
@melekes
Copy link
Contributor

melekes commented Jul 13, 2024

Could you add a changelog entry? btw we're using https://github.com/informalsystems/unclog

@alesforz alesforz added this pull request to the merge queue Jul 15, 2024
Merged via the queue into main with commit 8ff6942 Jul 15, 2024
6 checks passed
@alesforz alesforz deleted the 156-iter-key branch July 15, 2024 08:27
github-merge-queue bot pushed a commit to cometbft/cometbft that referenced this pull request Jul 31, 2024
…s. (#3541)

Related to cometbft/cometbft-db#168.

#### Changes
We recently updated cometbft-db `Iterator` type `Key()` and `Value()`
APIs to return their values directly instead of a copy.
Therefore, this PR ensures that code in cometbft that modifies or stores
references to the return value of those APIs creates a copy before
continuing.

---

#### PR checklist

~- [ ] Tests written/updated~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
mergify bot pushed a commit to cometbft/cometbft that referenced this pull request Jul 31, 2024
…s. (#3541)

Related to cometbft/cometbft-db#168.

#### Changes
We recently updated cometbft-db `Iterator` type `Key()` and `Value()`
APIs to return their values directly instead of a copy.
Therefore, this PR ensures that code in cometbft that modifies or stores
references to the return value of those APIs creates a copy before
continuing.

---

#### PR checklist

~- [ ] Tests written/updated~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit 3be35f0)
alesforz added a commit to cometbft/cometbft that referenced this pull request Jul 31, 2024
…s. (backport #3541) (#3598)

Related to cometbft/cometbft-db#168.

#### Changes
We recently updated cometbft-db `Iterator` type `Key()` and `Value()`
APIs to return their values directly instead of a copy.
Therefore, this PR ensures that code in cometbft that modifies or stores
references to the return value of those APIs creates a copy before
continuing.

---

#### PR checklist

~- [ ] Tests written/updated~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3541 done by
[Mergify](https://mergify.com).

Co-authored-by: Alessandro Sforzin <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Aug 9, 2024
* Revert "remove deprecated boltdb and cleveldb (#155)"

This reverts commit badc0b8.

We decided to reinstate boltDB and clevelDB and mark them as deprecated until a
future version of CometBFT in which we'll drop cometbft-db and support only 1 backend
DB.

* updated cleveldb Iterator API docs to conform to the changes made in #168

* updated go.mod

* updated boltDB Iterator APIs to conform to the changes made in #168

* added changelog entry

* Formatting

Co-authored-by: Daniel <[email protected]>

* formatting to please the linter

* added additional context in the docs of backend types constants

---------

Co-authored-by: Daniel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make a second DB "IterKey" API that returns a key to the caller that can potentially mutate
2 participants