-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: 35977 Adding regex support for ExtensionBlock replace method #36038
Conversation
pls merge master as we just merged a change in how replace works in internals. |
@jreback synced with upstream master also verified that this bug is still present in upstream master |
Conflicting files |
pandas/core/internals/blocks.py
Outdated
regex = validate_bool_kwarg(regex, "regex") | ||
if regex: | ||
dtype = self.values.dtype | ||
blocks = self.astype(object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give this a non-plural name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgangwar11 can you address?
doc/source/whatsnew/v1.2.0.rst
Outdated
- Fixed Bug where :class:`DataFrame` column set to scalar extension type via a dict instantion was considered an object type rather than the extension type (:issue:`35965`) | ||
- Fixed bug where ``astype()`` with equal dtype and ``copy=False`` would return a new object (:issue:`284881`) | ||
- Fixed bug when applying a NumPy ufunc with multiple outputs to a :class:`pandas.arrays.IntegerArray` returning None (:issue:`36913`) | ||
- Fixed bug in :meth:`Dataframe.replace` not working for ``regex=True`` and ``dtype='string'`` Now working as expected (:issue:`35977`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to:
- Fixed bug in :meth:`DataFrame.replace` now returns correct result for `regex=True`` with ``string`` dtype (:issue:`35977`)
) | ||
def test_replace_string(to_replace, value, input_data, expected_data): | ||
# GH35977 | ||
input_df = pd.DataFrame({"col1": input_data}, dtype="string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this df
# GH35977 | ||
input_df = pd.DataFrame({"col1": input_data}, dtype="string") | ||
expected_df = pd.DataFrame({"col1": expected_data}, dtype="string") | ||
input_df = input_df.replace(to_replace, value, inplace=False, regex=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this result
def test_replace_string(to_replace, value, input_data, expected_data): | ||
# GH35977 | ||
input_df = pd.DataFrame({"col1": input_data}, dtype="string") | ||
expected_df = pd.DataFrame({"col1": expected_data}, dtype="string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this expected
("f", "replace", ["d", "ee", "f", ""], ["d", "ee", "replace", ""]), | ||
], | ||
) | ||
def test_replace_string(to_replace, value, input_data, expected_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the idiom:
result = ...
expected = ...
tm.assert_frame_equal(result, frame)
) | ||
def test_replace_regex(to_replace, value, input_data, expected_data, inplace): | ||
# GH35977 | ||
input_df = pd.DataFrame({"col1": input_data}, dtype="string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this df
def test_replace_regex(to_replace, value, input_data, expected_data, inplace): | ||
# GH35977 | ||
input_df = pd.DataFrame({"col1": input_data}, dtype="string") | ||
expected_df = pd.DataFrame({"col1": expected_data}, dtype="string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this expected
# GH35977 | ||
input_df = pd.DataFrame({"col1": input_data}, dtype="string") | ||
expected_df = pd.DataFrame({"col1": expected_data}, dtype="string") | ||
converted = input_df.replace(to_replace, value, inplace=inplace, regex=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this df_replaced
], | ||
) | ||
def test_replace_regex(to_replace, value, input_data, expected_data, inplace): | ||
# GH35977 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the idiom:
result = ...
expected = ...
tm.assert_frame_equal(result, frame)
pandas/core/internals/blocks.py
Outdated
regex = validate_bool_kwarg(regex, "regex") | ||
if regex: | ||
dtype = self.values.dtype | ||
blocks = self.astype(object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgangwar11 can you address?
Also will need to merge master |
(r"f", "replace", ["d", "ee", "f", ""], ["d", "ee", "replace", ""], True), | ||
], | ||
) | ||
def test_replace_regex(to_replace, value, input_data, expected_data, inplace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is testing DataFrame.replace, these belong in tests.frame.methods.test_replace
Hello @cgangwar11! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-11-15 07:14:13 UTC |
("f", "replace", ["d", "ee", "f", ""], ["d", "ee", "replace", ""]), | ||
], | ||
) | ||
def test_replace_string(self, to_replace, value, input_data, expected_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parametrize these using frame_or_series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this addressed?
for b in block.replace(to_replace, value, inplace, regex) | ||
] | ||
block.replace(to_replace, value, inplace, regex) | ||
return block.astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently implement Block._replace_regex. can you see if we can avoid the object-dtype cast (ideally even this whole method) by using that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace_regex is using np.vectorize function internally which gives an error when using not object dtype hence using replace_regex after typecasting.
f = np.vectorize(re_replacer, otypes=[values.dtype])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgangwar11 what's the status of this? it sounds like you addressed most (possibly all) comments so if you merge master and move the whatsnew to 1.3 we'll take another look
One question I have is whether this affects the ExtensionDtype
Series.replace
case? If yes we'd like to test with series
Another question is whether we have asvs for this? (If not that doesn't have to be part of this PR IMO)
("f", "replace", ["d", "ee", "f", ""], ["d", "ee", "replace", ""]), | ||
], | ||
) | ||
def test_replace_string(self, to_replace, value, input_data, expected_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this addressed?
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
closing as stale. if you want to continue, pls ping and can re-open. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Overrides replace method to add support of regex in ExtensionBlock . If input to_replace is a regex we convert data type of block to Object and calls object replace method which supports regex otherwise we call replace method of super class Block.