You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Working with pyarrow, there doesn't seem to be any way to parse a StringArray full of json into an Array or Table of nested data. This surprises me, becuase pyarrow.json.read_json does exactly the right thing... but only for line-delimited json files. At least, I didn't see anything e.g. in pyarrow.compute and a google search came up empty.
Looking deeper there doesn't seem to be anything on the C++ side, either.
Skimming the C++ sources, I think there's a core logic that could be wrapped up into a proper compute function. If I read correctly:
which uses a HandlerBase instance that implements BlockParser
whose doParse method uses a rapidjson Reader to do the heavy lifting.
I think to parse a StringArray (instead of a file), we'd just need an ArrayParser variant, similar to BlockParser, that presents rapidjson a different buffer for each string to be parsed --- maybe a std::spanstream --- so that EOF becomes the "delimiter" instead of newline?
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered:
kou
changed the title
Support parsing a StringArray full of JSON to a Table
[C++][Python] Support parsing a StringArray full of JSON to a Table
Jan 15, 2023
Describe the enhancement requested
Working with pyarrow, there doesn't seem to be any way to parse a
StringArray
full of json into anArray
orTable
of nested data. This surprises me, becuase pyarrow.json.read_json does exactly the right thing... but only for line-delimited json files. At least, I didn't see anything e.g. in pyarrow.compute and a google search came up empty.Looking deeper there doesn't seem to be anything on the C++ side, either.
Skimming the C++ sources, I think there's a core logic that could be wrapped up into a proper compute function. If I read correctly:
BlockParser
Reader
to do the heavy lifting.I think to parse a
StringArray
(instead of a file), we'd just need anArrayParser
variant, similar toBlockParser
, that presents rapidjson a different buffer for each string to be parsed --- maybe a std::spanstream --- so that EOF becomes the "delimiter" instead of newline?Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: