In order to create a new project, firstly, you need to right-click on a folder in Text editor, and choose New Mage project
. Secondly, you need to open Settings
and click on Register project
.
Opening a text editor:
- Go to the command center (At the top)
- Type "text editor"
The project unit_1_data_preparation
now has an empty pipeline, and it can be developed further using blocks. The first one we'll create is an ingestion block, which uses Python code to download the parquet files from January to March of the green taxi datasets and concatenate them. Done that, generate a series of graphs and charts useful for data profiling.
- Note: If the time chart isn't displayed, insert the following snippet
df['lpep_pickup_datetime_cleaned'] = df['lpep_pickup_datetime'].astype(np.int64) // 10**9
just above thedfs.append(df)
line iningest.py
Code:
Utility functions are already created in the utils
folder. They will be then imported into the transformer block.
Code
To see the correct histogram, change last two lines of the default code to:
col = 'trip_distance'
x = df_1[df_1[col] <= 20][col]