-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Support Dict popitem #2701
Conversation
@gabrieldemarmiesse , I open another PR since the previous one is quite messy for fixing conflict I also rewrite it differently from your previous comment, which I think it looks better! |
stdlib/src/collections/dict.mojo
Outdated
var key = item[].key | ||
var value = item[].value | ||
_ = self.pop(key) | ||
return DictEntry[K, V](key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
item[] is not valid at this point, since it is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks simpler indeed. I'm wondering if it's future-proof though. The code is correct, but I'm afraid the compiler will not be happy with it later on.
stdlib/src/collections/dict.mojo
Outdated
for item in reversed(self.items()): | ||
var key = item[].key | ||
var value = item[].value | ||
_ = self.pop(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that in the future this will cause issue when the compiler is stricter, no? I would expect the compiler to prevent us from changing an object during iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take a look what python does
a = {'a': 1, 'b': 2}
for k, v in a.items():
print(k, v)
item = a.popitem()
print(item)
Result
a 1
('b', 2)
Traceback (most recent call last):
File "/Users/bytedance/mojo/sample.py", line 2, in <module>
for k, v in a.items():
RuntimeError: dictionary changed size during iteration
I agree we may return error for modifying dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dang this is way harder than I anticipated...
Maybe let's go back to using optionals?
a42b377
to
6846af8
Compare
6846af8
to
4fa4599
Compare
Please ping me when the changelog entry is updated to adhere to the markdown formatting used by CI now (this matches what we use internally to make imports easier). |
docs/changelog.md
Outdated
@@ -18,6 +18,8 @@ what we publish. | |||
|
|||
### ⭐️ New | |||
|
|||
- `Dict` now support `popitem`, which remove and return the last dict item. ([PR #2701](https://github.com/modularml/mojo/pull/2701) by [@jayzhan211](https://github.com/jayzhan211)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `Dict` now support `popitem`, which remove and return the last dict item. ([PR #2701](https://github.com/modularml/mojo/pull/2701) by [@jayzhan211](https://github.com/jayzhan211)) | |
- `Dict` now support `popitem`, which removes and returns the last item in the `Dict`. ([PR #2701](https://github.com/modularml/mojo/pull/2701) by [@jayzhan211](https://github.com/jayzhan211)) |
stdlib/src/collections/dict.mojo
Outdated
_ = self.pop(key.value()[]) | ||
return DictEntry[K, V](key.value()[], val.value()[]) | ||
|
||
raise "KeyError" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion Let's give a more descriptive error like Python gives.
KeyError: 'popitem(): dictionary is empty'
Perhaps just s/dictionary/Dict
for our error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyError: 'popitem(): dictionary is empty'
This looks nice
4fa4599
to
0cd39cf
Compare
Done! @JoeLoser |
Ups, sorry, I did not know of this PR and I implemented this one with the same method: #2794 Not sure if you want to take a look. I would guess my implementation could be slightly faster Maybe you can be inspired by my PR. |
Maybe I'm mistaken but the implementation in this PR is |
Yes, you are right. I missed the |
2038a94
to
08a59a7
Compare
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
08a59a7
to
1378379
Compare
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
Landed in 051cdd0! Thank you for your contribution 🎉 |
[External] [stdlib] Support `Dict.popitem()` Implement `Dict.popitem()` which removes and returns the last item in the `Dict`. Fixes #2355 --------- Co-authored-by: Jay Zhan <[email protected]> Closes #2701 MODULAR_ORIG_COMMIT_REV_ID: c93387c953e05a447e6467270c54093b85e743b0
FYI @jayzhan211 @JoeLoser, this PR introducing nondeterminism into
|
…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87
[External] [stdlib] Support `Dict.popitem()` Implement `Dict.popitem()` which removes and returns the last item in the `Dict`. Fixes #2355 --------- Co-authored-by: Jay Zhan <[email protected]> Closes #2701 MODULAR_ORIG_COMMIT_REV_ID: c93387c953e05a447e6467270c54093b85e743b0
…(Dict.items())` (#40974) [External] [stdlib] Fix UB in `reversed(Dict.values())` and `reversed(Dict.items())` Finally found the culprit in the flakyness that plagued us since a few week in the `test_reversed.mojo`. ### The actual bug: When iterating over a list in reverse order, we should start at `len(my_list) - 1` not at `len(my_list)`. That triggered an out of bounds access and thus was undefined behavior. ### The effect on our CI As you know, we have been seeing flakyness lately. It was documented a number of times and always related to `reverse`: * #2866 (comment) * #2369 ### Why was it passing sometimes? This is because there were `Optional[...]` in the List. Thus if the flag of the `Optional` says that no element is present, it's just skipped (the dict doesn't have an entry at this index). So the list of the Dict would often look like this: `["a", "b", "c", "d"] None` but the last `None` is actually memory that we don't have access to. Sometimes it's then skipped in the iteration making the tests pass. Sometimes it would cause segfaults because the test dict worked with strings. Sometimes we would get `wrong variant type` since we don't know what happens to the memory between None check and access. ### Why wasn't it found earlier? First of all, our Dict implementation is too complexe for what it does and thus is very good at hiding bugs. Well we did have `debug_assert` before getting the element of the `List`, but this `debug_assert` looked like this in the dict iterator: ```mojo @parameter if forward: debug_assert( self.index < self.src[]._reserved, "dict iter bounds" ) else: debug_assert(self.index >= 0, "dict iter bounds") ``` So one bound was checked when reading in one direction and the other bound was checked in the other direction. A better `debug_assert` would have been ```mojo debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds") ``` When I worked on my PR #2718 the condition `self.index < self.src[]._reserved` didn't trigger anything since it was in the wrong branch, it was never executed. Also before, `__get_ref` didn't have any bounds checks, even when assertions were enabled. A recent commit 8d0870e adds `unsafe_get()` in List and make `__get_ref` use it. It also adds `debug_assert` to `unsafe_get()`, which means that now `__get_ref` has bounds checks if run with assertions enabled. This allowed me to catch the out of bounds access when updating #2718 making the fail deterministic and debuggable. Since we have this, the `debug_assert` in `dict.mojo` isn't necessary anymore. ### Consequences on ongoing work: * This fix have been also added to #2718 * The PR #2701 that we did with @jayzhan211 was actually correct. It was just using `reverse(Dict.items())` which was buggy at the time. After the fix is merged, we can re-revert this PR. * #2794 is not necessary anymore since the implementation by @jayzhan211 was correct. * The real cause of #2866 was found, the issue has already been closed though. * #2369 can be closed for good. * #2832 can be closed for good. ### Closing thoughts * We really need to run the unit tests with assertions enabled and add assertions whenever necessary * The dict implementation is a bit too complicated. For example, `self._reserved` is the length of the internal list. There is no need to store the length of the list twice. Let's drop this variable and use `len(self._entries)` instead. I guess this is a relic of the time when `List` wasn't completely flushed out. If had done so, it would have been ovious that we can't do `my_list.__get_ref(len(my_list))` * Iterating manually over a list like this is bug-prone. The implementation we have especially is, since ```mojo @parameter if forward: self.index += 1 else: self.index -= 1 ``` is done twice in the code, it should only be done once. While there is no bug, code duplication and complexity hides bugs. * We should iterate over the list with a list iterator, not with a custom-made iterator. This will remove a lot of code in the `dict.mojo`. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2896 MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87
close #2355
close #2542