-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Ensure Parquet schema metadata is added to arrow table (#137)
* Add parquet schema metadata to arrow table * Better findings * fix exporting metadata * Add tests for metadata preservation via ffi * smaller diff * Update parquet schema metadata * Update parquet test * update comment * ensure valid with column projection
- Loading branch information
1 parent
e0088bc
commit fcdf5b8
Showing
7 changed files
with
88 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
*.parquet | ||
*.whl | ||
|
||
# Generated by Cargo | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
import pyarrow as pa | ||
import pyarrow.parquet as pq | ||
from arro3.io import read_parquet, write_parquet | ||
|
||
|
||
def test_copy_parquet_kv_metadata(): | ||
metadata = {"hello": "world"} | ||
table = pa.table({"a": [1, 2, 3]}) | ||
write_parquet( | ||
table, | ||
"test.parquet", | ||
key_value_metadata=metadata, | ||
skip_arrow_metadata=True, | ||
) | ||
|
||
# Assert metadata was written, but arrow schema was not | ||
pq_meta = pq.read_metadata("test.parquet").metadata | ||
assert pq_meta[b"hello"] == b"world" | ||
assert b"ARROW:schema" not in pq_meta.keys() | ||
|
||
# When reading with pyarrow, kv meta gets assigned to table | ||
pa_table = pq.read_table("test.parquet") | ||
assert pa_table.schema.metadata[b"hello"] == b"world" | ||
|
||
reader = read_parquet("test.parquet") | ||
assert reader.schema.metadata[b"hello"] == b"world" |