Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for StringDtype (available in Pandas >=1.0) #1237

Closed
pgagarinov opened this issue Oct 27, 2020 · 5 comments · Fixed by #2319
Closed

Add support for StringDtype (available in Pandas >=1.0) #1237

pgagarinov opened this issue Oct 27, 2020 · 5 comments · Fixed by #2319
Labels

Comments

@pgagarinov
Copy link

Feature Request

Allow for Pandas dataframe columns of "string" datatype - see https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html

Description of Problem:

Right now the following call of perspective's Table constructor fails:

from perspective import Table
import pandas as pd
df = pd.DataFrame({'a': ['aa','bbb'],'b':['dddd','dd']},dtype='string')
Table(df)

with
PerspectiveError: Mixed datasets of numpy.ndarray and lists are not supported.

Potential Solutions:

Maybe just convert (internally) to dtype('o') for backward compatibility as the first step... At least Mixed datasets of numpy.ndarray and lists are not supported. needs to be replaced with something more meaningful as StringDType is neither list nor numpy array. It does also look that StringDType can be converted to arrow array, no problem (see https://github.com/pandas-dev/pandas/blob/v1.1.3/pandas/core/arrays/string_.py#L26-L97)

@timkpaine
Copy link
Member

timkpaine commented Oct 27, 2020

We'll never support pure transparent pandas compatibility, so best bet is to convert to stable apis (like object) prior to trying to ingest. We will try to keep up with core developments, but our current pandas target is 0.22 so theres more likelihood of us pinning pandas<1 in the short term than us updating support for all new features.

@timkpaine timkpaine added Python question Questions about use, potential features, or improvements labels Oct 27, 2020
@texodus
Copy link
Member

texodus commented Oct 28, 2020

What is "pure transparent pandas compatibility"? Support for dtype="string" seems simple to add, as does support for newer Pandas versions >1.

@texodus texodus removed the question Questions about use, potential features, or improvements label Oct 28, 2020
@timkpaine
Copy link
Member

timkpaine commented Oct 28, 2020

@texodus as an example, pivot deconstruction and reconstruction from pandas is mostly broken right now, and there are lots of scenarios where you can't just go from pandas into perspective and get the same results, e.g. in a pivot table you might not have the ability to unpivot and repivot.

Also we should be careful to avoid features explicitly marked "unstable" whether or not they exist in a newer version.

@timkpaine
Copy link
Member

From the docs "StringDtype is considered experimental. The implementation and parts of the API may change without warning."

@MaDufie
Copy link
Contributor

MaDufie commented Jul 18, 2023

Hey, I have been looking into this issue and have been able to reproduce it. The issue has been open for a while. I was wondering if anyone has come across any possible approach. I have tried the potential solution above and other variations of it ( dtype('str'), dtype(np.str_) ) and it seems to have resolved the error:
PerspectiveError: Mixed datasets of numpy.ndarray and lists are not supported.

but it generated a new error below:
TypeError: Cannot interpret 'string[python]' as a data type.

MaDufie added a commit to MaDufie/perspective that referenced this issue Jul 30, 2023
texodus added a commit that referenced this issue Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants