BUG: Unexpected behavior pandas.DataFrame.replace with "string" dtype #41333

camilogutierrez · 2021-05-05T16:12:13Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

I'm trying to remove certain characters in a DataFrame, in this case, I am trying to remove the ñ.

Code Sample

df = pd.DataFrame([['añoos','asññ','ÑolÑÑss'],['añoos','asññ','ÑolÑÑss']], dtype='string')
df.replace({'ñ':'n',
            'Ñ':'N'}, 
           regex = True, inplace = True)

The output is:

	0	1	2
0	añoos	asññ	ÑolÑÑss
1	añoos	asññ	ÑolÑÑss

Problem description

If I do not specify the dtype to string, the output is the expected as follows:

Expected Output

df = pd.DataFrame([['añoos','asññ','ÑolÑÑss'],['añoos','asññ','ÑolÑÑss']])
df.replace({'ñ':'n',
            'Ñ':'N'}, 
           regex = True, inplace = True)
print(df)

	0	1	2
0	anoos	asnn	NolNNss
1	anoos	asnn	NolNNss

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : es_419.UTF-8
LOCALE : Spanish_Colombia.1252

pandas : 1.2.4
numpy : 1.19.4
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 54.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.52.0

The text was updated successfully, but these errors were encountered:

attack68 · 2021-05-05T19:09:18Z

if you simplify this to:

df = pd.DataFrame([['xax', 'xbx']], dtype='string')
df = df.replace({'a': 'c', 'b': 'd'}, regex=True)

i confirm the issue exists on master

camilogutierrez added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2021

attack68 added Strings String extension data type and string data and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2021

mzeitlin11 mentioned this issue May 6, 2021

BUG: replace with regex raising for StringDType #41343

Merged

4 tasks

simonjayhawkins added this to the 1.3 milestone May 9, 2021

jreback closed this as completed in #41343 May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Unexpected behavior pandas.DataFrame.replace with "string" dtype #41333

BUG: Unexpected behavior pandas.DataFrame.replace with "string" dtype #41333

camilogutierrez commented May 5, 2021 •

edited

Loading

INSTALLED VERSIONS

attack68 commented May 5, 2021

BUG: Unexpected behavior pandas.DataFrame.replace with "string" dtype #41333

BUG: Unexpected behavior pandas.DataFrame.replace with "string" dtype #41333

Comments

camilogutierrez commented May 5, 2021 • edited Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

attack68 commented May 5, 2021

camilogutierrez commented May 5, 2021 •

edited

Loading

Output of `pd.show_versions()`