-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Implement Dict.popitem()
method
#2794
Conversation
Signed-off-by: Manuel Saelices <[email protected]>
Dict.popitem()
method
Dict.popitem()
methodDict.popitem()
and Dict.setdefault()
methods (WIP)
Have you seen #2701 by @jayzhan211? |
Closed in favor of: #2701 |
Dict.popitem()
and Dict.setdefault()
methods (WIP)Dict.popitem()
method
Since I'd recommend adding a changelog entry and updating the raise to be more descriptive like |
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Re-opened and synced from |
Signed-off-by: Manuel Saelices <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good — just one minor wording suggestion on the changelog entry, and then I'm happy to import and merge this.
stdlib/src/collections/dict.mojo
Outdated
""" | ||
if self.size == 0: | ||
raise "KeyError: popitem(): dictionary is empty" | ||
var entry = self._entries.pop(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we follow the default behaviour like python in LIFO order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right. Fixed here: msaelices@a034f7b
Can we try another approach that is not using What do you think @JoeLoser @msaelices ? |
SGTM! |
Signed-off-by: Manuel Saelices <[email protected]>
Co-authored-by: Joe Loser <[email protected]> Signed-off-by: Manuel Saelices <[email protected]>
Why do we need to iterate? To get the FILO behaviour we can just do |
Signed-off-by: Manuel Saelices <[email protected]>
stdlib/src/collections/dict.mojo
Outdated
""" | ||
if self.size == 0: | ||
raise "KeyError: popitem(): dictionary is empty" | ||
var entry = self._entries.pop(self.size - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn _insert(inout self, owned entry: DictEntry[K, V]):
self._maybe_resize()
var found: Bool
var slot: Int
var index: Int
found, slot, index = self._find_index(entry.hash, entry.key)
self._entries[index] = entry^
if not found:
self._set_index(slot, index)
self.size += 1
self._n_entries += 1
I think index
in the _entries
is computed by hash function, which mean the element index is not ordered. You pop the item based on the index of self.size - 1
, which may not always be the last item
we expect.
You can try to pop and insert the dict to mix the dictionary, I think it is possible to get the unexpected result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterator including reversed iterator both check if the index in entries has item or not
var opt_entry_ref = self.src[]._entries.__get_ref(self.index
@always_inline
fn __next__(inout self) -> Self.ref_type:
while True:
@parameter
if forward:
debug_assert(
self.index < self.src[]._reserved, "dict iter bounds"
)
else:
debug_assert(self.index >= 0, "dict iter bounds")
var opt_entry_ref = self.src[]._entries.__get_ref(self.index)
if opt_entry_ref[]:
@parameter
if forward:
self.index += 1
else:
self.index -= 1
self.seen += 1
return opt_entry_ref[].value()[]
@parameter
if forward:
self.index += 1
else:
self.index -= 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of it. Please review the _DictEntryIter
, I'm deleted non-relevant fragments for this question:
@value
struct _DictEntryIter[
... # omitted
forward: Bool = True,
]:
"""Iterator over immutable DictEntry references.
Parameters:
... # omitted
forward: The iteration direction. `False` is backwards.
"""
... # omitted
var index: Int
var src: Reference[Dict[K, V], dict_mutability, dict_lifetime]
@always_inline
fn __next__(inout self) -> Self.ref_type:
while True:
... # omitted
var opt_entry_ref = self.src[]._entries.__get_ref(self.index)
if opt_entry_ref[]:
@parameter
if forward:
self.index += 1
else:
self.index -= 1
return opt_entry_ref[].value()[]
@parameter
if forward:
self.index += 1
else:
self.index -= 1
... # omitted
When we call self.items()
in the Dict
, a _DictEntryIter
obj is returned. If you call reversed(self.items())
, the same obj is created with the forward
param set to False
So, as you can see in the code above, the index is an integer that is being incremented or decremented while looping over the _entries
, calling to the dict._entries.__get_ref(index)
.
This code calls the self._entries.pop(self.size - 1)
being self._entries
a List
object, which works with integers indexes.
So, I think that the code is right. I am not 100% sure as I'm just landing on the Mojo's world. I will complete the current tests to make sure it works as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about it. I think you are right. I will refactor the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 fixed here: msaelices@c3112cd
In the end, I finished implementing a very similar approach to yours, as the tests raised fatal errors if we added a new key after calling to popitem()
. I think it's because what you said, the _entries
list is bigger than the amount of entries with holes on it when we remove items from the dict.
Now, with the current implementation, I'm not sure if we will find the same issue as in your PR code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 FYI: It seems @gabrieldemarmiesse has found the cause of the flakiness in your code which it was NOT caused by your code but a boundary bug in the reversed()
logic. See #2896
Now I guess we can re-introduce your code or also this one. Let's @JoeLoser decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 Just in case, I've added you to the changelog entry: msaelices@bd0b912
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayzhan211 FYI: It seems @gabrieldemarmiesse has found the cause of the flakiness in your code which it was NOT caused by your code but a boundary bug in the
reversed()
logic. See #2896Now I guess we can re-introduce your code or also this one. Let's @JoeLoser decide.
I don't mind, at least the code looks good to me
Fatal errors were detected if we add a new key after calling popitem() Thanks to @jayzhan211 for the heads-up Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87
…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87
@msaelices The PR looks good. But since #2896 was merged into nightly and fixed the flakyness exposed by #2701 , the |
Sure. Closed. |
Dict.popitem()
method