Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support struct array in pretty display #579

Merged
merged 5 commits into from
Jul 23, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion arrow/src/util/display.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,13 +205,36 @@ pub fn make_string_from_decimal(column: &Arc<dyn Array>, row: usize) -> Result<S
Ok(formatted_decimal)
}

fn append_struct_field_string(
target: &mut String,
name: &str,
field_col: &Arc<dyn Array>,
row: usize,
) -> Result<()> {
target.push('"');
target.push_str(name);
target.push_str("\": ");
match field_col.data_type() {
DataType::Utf8 | DataType::LargeUtf8 => {
target.push('"');
target.push_str(array_value_to_string(field_col, row)?.as_str());
target.push('"');
}
_ => {
target.push_str(array_value_to_string(field_col, row)?.as_str());
}
}

Ok(())
}

/// Get the value at the given row in an array as a String.
///
/// Note this function is quite inefficient and is unlikely to be
/// suitable for converting large arrays or record batches.
pub fn array_value_to_string(column: &array::ArrayRef, row: usize) -> Result<String> {
if column.is_null(row) {
return Ok("".to_string());
return Ok("null".to_string());
}
match column.data_type() {
DataType::Utf8 => make_string!(array::StringArray, column, row),
Expand Down Expand Up @@ -280,6 +303,31 @@ pub fn array_value_to_string(column: &array::ArrayRef, row: usize) -> Result<Str
column.data_type()
))),
},
DataType::Struct(_) => {
let st = column
.as_any()
.downcast_ref::<array::StructArray>()
.ok_or_else(|| {
ArrowError::InvalidArgumentError(
"Repl error: could not convert struct column to struct array."
.to_string(),
)
})?;

let mut s = String::new();
s.push('{');
let mut kv_iter = st.columns().into_iter().zip(st.column_names().into_iter());
if let Some((col, name)) = kv_iter.next() {
append_struct_field_string(&mut s, name, col, row)?;
}
for (col, name) in kv_iter {
s.push_str(", ");
append_struct_field_string(&mut s, name, col, row)?;
}
s.push('}');

Ok(s)
}
_ => Err(ArrowError::InvalidArgumentError(format!(
"Pretty printing not implemented for {:?} type",
column.data_type()
Expand Down
127 changes: 93 additions & 34 deletions arrow/src/util/pretty.rs
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,9 @@ mod tests {
use crate::{
array::{
self, new_null_array, Array, Date32Array, Date64Array, PrimitiveBuilder,
StringBuilder, StringDictionaryBuilder, Time32MillisecondArray,
Time32SecondArray, Time64MicrosecondArray, Time64NanosecondArray,
TimestampMicrosecondArray, TimestampMillisecondArray,
StringArray, StringBuilder, StringDictionaryBuilder, StructArray,
Time32MillisecondArray, Time32SecondArray, Time64MicrosecondArray,
Time64NanosecondArray, TimestampMicrosecondArray, TimestampMillisecondArray,
TimestampNanosecondArray, TimestampSecondArray,
},
datatypes::{DataType, Field, Int32Type, Schema},
Expand Down Expand Up @@ -148,14 +148,14 @@ mod tests {
let table = pretty_format_batches(&[batch])?;

let expected = vec![
"+---+-----+",
"| a | b |",
"+---+-----+",
"| a | 1 |",
"| b | |",
"| | 10 |",
"| d | 100 |",
"+---+-----+",
"+------+------+",
"| a | b |",
"+------+------+",
"| a | 1 |",
"| b | null |",
"| null | 10 |",
"| d | 100 |",
"+------+------+",
];

let actual: Vec<&str> = table.lines().collect();
Expand All @@ -180,8 +180,8 @@ mod tests {
let table = pretty_format_columns("a", &columns)?;

let expected = vec![
"+---+", "| a |", "+---+", "| a |", "| b |", "| |", "| d |", "| e |",
"| |", "| g |", "+---+",
"+------+", "| a |", "+------+", "| a |", "| b |", "| null |",
"| d |", "| e |", "| null |", "| g |", "+------+",
];

let actual: Vec<&str> = table.lines().collect();
Expand Down Expand Up @@ -212,14 +212,14 @@ mod tests {
let table = pretty_format_batches(&[batch]).unwrap();

let expected = vec![
"+---+---+---+",
"| a | b | c |",
"+---+---+---+",
"| | | |",
"| | | |",
"| | | |",
"| | | |",
"+---+---+---+",
"+------+------+------+",
"| a | b | c |",
"+------+------+------+",
"| null | null | null |",
"| null | null | null |",
"| null | null | null |",
"| null | null | null |",
"+------+------+------+",
];

let actual: Vec<&str> = table.lines().collect();
Expand Down Expand Up @@ -252,7 +252,7 @@ mod tests {
"| d1 |",
"+-------+",
"| one |",
"| |",
"| null |",
"| three |",
"+-------+",
];
Expand Down Expand Up @@ -297,7 +297,7 @@ mod tests {
"| f |",
"+---------------------+",
"| 1970-05-09 14:25:11 |",
"| |",
"| null |",
"+---------------------+",
];
check_datetime!(TimestampSecondArray, 11111111, expected);
Expand All @@ -310,7 +310,7 @@ mod tests {
"| f |",
"+-------------------------+",
"| 1970-01-01 03:05:11.111 |",
"| |",
"| null |",
"+-------------------------+",
];
check_datetime!(TimestampMillisecondArray, 11111111, expected);
Expand All @@ -323,7 +323,7 @@ mod tests {
"| f |",
"+----------------------------+",
"| 1970-01-01 00:00:11.111111 |",
"| |",
"| null |",
"+----------------------------+",
];
check_datetime!(TimestampMicrosecondArray, 11111111, expected);
Expand All @@ -336,7 +336,7 @@ mod tests {
"| f |",
"+-------------------------------+",
"| 1970-01-01 00:00:00.011111111 |",
"| |",
"| null |",
"+-------------------------------+",
];
check_datetime!(TimestampNanosecondArray, 11111111, expected);
Expand All @@ -349,7 +349,7 @@ mod tests {
"| f |",
"+------------+",
"| 1973-05-19 |",
"| |",
"| null |",
"+------------+",
];
check_datetime!(Date32Array, 1234, expected);
Expand All @@ -362,7 +362,7 @@ mod tests {
"| f |",
"+------------+",
"| 2005-03-18 |",
"| |",
"| null |",
"+------------+",
];
check_datetime!(Date64Array, 1111111100000, expected);
Expand All @@ -375,7 +375,7 @@ mod tests {
"| f |",
"+----------+",
"| 00:18:31 |",
"| |",
"| null |",
"+----------+",
];
check_datetime!(Time32SecondArray, 1111, expected);
Expand All @@ -388,7 +388,7 @@ mod tests {
"| f |",
"+--------------+",
"| 03:05:11.111 |",
"| |",
"| null |",
"+--------------+",
];
check_datetime!(Time32MillisecondArray, 11111111, expected);
Expand All @@ -401,7 +401,7 @@ mod tests {
"| f |",
"+-----------------+",
"| 00:00:11.111111 |",
"| |",
"| null |",
"+-----------------+",
];
check_datetime!(Time64MicrosecondArray, 11111111, expected);
Expand All @@ -414,7 +414,7 @@ mod tests {
"| f |",
"+--------------------+",
"| 00:00:00.011111111 |",
"| |",
"| null |",
"+--------------------+",
];
check_datetime!(Time64NanosecondArray, 11111111, expected);
Expand Down Expand Up @@ -462,7 +462,7 @@ mod tests {
"| f |",
"+-------+",
"| 1.01 |",
"| |",
"| null |",
"| 2.00 |",
"| 30.40 |",
"+-------+",
Expand Down Expand Up @@ -498,7 +498,7 @@ mod tests {

let table = pretty_format_batches(&[batch])?;
let expected = vec![
"+------+", "| f |", "+------+", "| 101 |", "| |", "| 200 |",
"+------+", "| f |", "+------+", "| 101 |", "| null |", "| 200 |",
"| 3040 |", "+------+",
];

Expand All @@ -507,4 +507,63 @@ mod tests {

Ok(())
}

#[test]
fn test_pretty_format_struct() -> Result<()> {
let schema = Schema::new(vec![
Field::new(
"c1",
DataType::Struct(vec![
Field::new("c11", DataType::Int32, false),
Field::new(
"c12",
DataType::Struct(vec![Field::new("c121", DataType::Utf8, false)]),
false,
),
]),
false,
),
Field::new("c2", DataType::Utf8, false),
]);

let c1 = StructArray::from(vec![
(
Field::new("c11", DataType::Int32, false),
Arc::new(Int32Array::from(vec![Some(1), None, Some(5)])) as ArrayRef,
),
(
Field::new(
"c12",
DataType::Struct(vec![Field::new("c121", DataType::Utf8, false)]),
false,
),
Arc::new(StructArray::from(vec![(
Field::new("c121", DataType::Utf8, false),
Arc::new(StringArray::from(vec![Some("e"), Some("f"), Some("g")]))
as ArrayRef,
)])) as ArrayRef,
),
]);
let c2 = StringArray::from(vec![Some("a"), Some("b"), Some("c")]);

let batch =
RecordBatch::try_new(Arc::new(schema), vec![Arc::new(c1), Arc::new(c2)])
.unwrap();

let table = pretty_format_batches(&[batch])?;
let expected = vec![
r#"+-------------------------------------+----+"#,
r#"| c1 | c2 |"#,
r#"+-------------------------------------+----+"#,
r#"| {"c11": 1, "c12": {"c121": "e"}} | a |"#,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool -- it is almost like JSON

r#"| {"c11": null, "c12": {"c121": "f"}} | b |"#,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could have been useful to have a struct with a validity tested, to make sure we do not miss that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will send a follow up PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a null in the fields' values but not in all other types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. I went with this way because I feel like it looks more clear than "c11": ,. There is also the other option of skipping the column with null value entirely in the display. Let me know if you feel strongly on one way or the other, happy to send a PR to change the behavior as well.

r#"| {"c11": 5, "c12": {"c121": "g"}} | c |"#,
r#"+-------------------------------------+----+"#,
];

let actual: Vec<&str> = table.lines().collect();
assert_eq!(expected, actual, "Actual result:\n{}", table);

Ok(())
}
}