-
Notifications
You must be signed in to change notification settings - Fork 8
explode_json_to_cols
Hassan Syyid edited this page Feb 11, 2021
·
1 revision
Converts a JSON column that has an array value such as [{"Name": "First", "Value": "John"}, {"Name": "Last", "Value": "Smith"}]
into a dataframe with a column for each value. Note that the new series produced from the JSON will be de-duplicated and inner joined with the index.
explode_json_to_cols(df, column_name, **kwargs):
"""
:param df : the dataframe
:param column_name: the column that has the JSON in it.
:param reducer: a reducer that will convert the array of JSON into a panda series
:return df - a new data frame with the JSON line expanded into columns and rows
"""
Take a look at the CustomField
column. Below is an example of an entry
[{"DefinitionId": "1", "Name": "Crew #", "Type": "StringType", "StringValue": "102"}]
You can see this is JSON encoded data, specifying one custom field: Crew # with value 102
To explode this, we'll need to reduce this as we only care about the Name and StringValue. We can use gluestick's explode_json_to_cols function with an array_to_dict_reducer to accomplish this.
invoices = invoices.pipe(gs.explode_json_to_cols, 'CustomField', reducer=gs.array_to_dict_reducer('Name', 'StringValue'))
invoices[['Id', 'CustomField.Crew #']].head()