Skip to content

explode_json_to_cols

Hassan Syyid edited this page Feb 11, 2021 · 1 revision

explode_json_to_cols

Converts a JSON column that has an array value such as [{"Name": "First", "Value": "John"}, {"Name": "Last", "Value": "Smith"}] into a dataframe with a column for each value. Note that the new series produced from the JSON will be de-duplicated and inner joined with the index.

Definition

explode_json_to_cols(df, column_name, **kwargs):
"""
               :param df : the dataframe
               :param column_name: the column that has the JSON in it.
               :param reducer: a reducer that will convert the array of JSON into a panda series
               :return df - a new data frame with the JSON line expanded into columns and rows
"""

Example

Input

input

Objective

Take a look at the CustomField column. Below is an example of an entry

[{"DefinitionId": "1", "Name": "Crew #", "Type": "StringType", "StringValue": "102"}]

You can see this is JSON encoded data, specifying one custom field: Crew # with value 102

To explode this, we'll need to reduce this as we only care about the Name and StringValue. We can use gluestick's explode_json_to_cols function with an array_to_dict_reducer to accomplish this.

invoices = invoices.pipe(gs.explode_json_to_cols, 'CustomField', reducer=gs.array_to_dict_reducer('Name', 'StringValue'))
invoices[['Id', 'CustomField.Crew #']].head()

Output

out

Clone this wiki locally