Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] Support parsing a StringArray full of JSON to a Table #33662

Open
ryan-johnson-databricks opened this issue Jan 13, 2023 · 0 comments

Comments

@ryan-johnson-databricks
Copy link

ryan-johnson-databricks commented Jan 13, 2023

Describe the enhancement requested

Working with pyarrow, there doesn't seem to be any way to parse a StringArray full of json into an Array or Table of nested data. This surprises me, becuase pyarrow.json.read_json does exactly the right thing... but only for line-delimited json files. At least, I didn't see anything e.g. in pyarrow.compute and a google search came up empty.

Looking deeper there doesn't seem to be anything on the C++ side, either.

Skimming the C++ sources, I think there's a core logic that could be wrapped up into a proper compute function. If I read correctly:

I think to parse a StringArray (instead of a file), we'd just need an ArrayParser variant, similar to BlockParser, that presents rapidjson a different buffer for each string to be parsed --- maybe a std::spanstream --- so that EOF becomes the "delimiter" instead of newline?

Component(s)

C++, Python

@kou kou changed the title Support parsing a StringArray full of JSON to a Table [C++][Python] Support parsing a StringArray full of JSON to a Table Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant