Skip to content

Commit

Permalink
update plot notebooks, nesta colors and analysis readme
Browse files Browse the repository at this point in the history
  • Loading branch information
lizgzil committed Mar 21, 2024
1 parent f599292 commit bde99e4
Show file tree
Hide file tree
Showing 7 changed files with 292 additions and 706 deletions.
196 changes: 17 additions & 179 deletions dap_prinz_green_jobs/notebooks/2x2_typologies.ipynb

Large diffs are not rendered by default.

85 changes: 84 additions & 1 deletion dap_prinz_green_jobs/notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The below notebooks create graphs that are used in the Green Jobs Explorer.

### Regional Comparison

The `regional_comparison.ipynb` notebook contains code to create a chloropleth of regional comparisons of the greenness measures for the Green Jobs Explorer.
The `regional_comparison_new.ipynb` notebook contains code to create a chloropleth of regional comparisons of the greenness measures for the Green Jobs Explorer.

### 2x2 Typologies

Expand All @@ -37,3 +37,86 @@ The `new_skills_analysis.ipynb` notebook contains code to create graphs that exp
### Violin plots

The `violin_plots.ipynb` notebook contains code to create violin plots of the greenness measures for the the Green Jobs Explorer.

### Common skills plots

The `common_skills.ipynb` notebook contains code to create bar plots of the most common skills and green skills.

### Similar occupations plots

The `Similar_occupations.ipynb` notebook contains code to create the bar plots of similar occupations for a single occupation selected from a drop down.

### How to make tootlips have a maximum width.

After creating the html plots for the Green Jobs Explorer using the above notebooks, you will need to manually edit all the html's produced in Altair. This is to make sure the tooltip widths are slim enough to not overlap the edge of the plot (which happens by default).

You add two things:

```
<!DOCTYPE html>
<html>
<head>
<style>
#vis.vega-embed {
width: 100%;
display: flex;
}
#vis.vega-embed details,
#vis.vega-embed details summary {
position: relative;
}
#vg-tooltip-element.vg-tooltip.custom-theme {
max-width: 40%;
}
</style>
```

i.e. add the

```
#vg-tooltip-element.vg-tooltip.custom-theme {
max-width: 40%;
}
```

to the style. You can change max-width to be a % or a value in px (e.g. 200px).

And:

```
var tooltipOptions = {
theme: 'custom'
};
var embedOpt = {"mode": "vega-lite", tooltip: tooltipOptions};
```

at the end, instead of just

```
var embedOpt = {"mode": "vega-lite"};
```

### Add dropdown above plot

By default Altair puts the dropdown selector box to the bottom left of the plot, to change this, add:

```
form.vega-bindings {
position: absolute;
left: 0px;
top: -10px;
}
```

in the `<style>` part of the htmls.

You may wish to leave a gap for this between the chart title and the plot - if so when you save the plot in your python code, you can add a gap:

```
fig.configure_title(offset=100).save("plot.html")
```
92 changes: 58 additions & 34 deletions dap_prinz_green_jobs/notebooks/Similar_occupations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "0b2bdc71-1edf-4f6f-8ad5-8dccc70e2688",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -60,7 +60,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "6c9d06db-f4e1-4b54-a9b3-45e60837172e",
"metadata": {},
"outputs": [],
Expand All @@ -70,15 +70,15 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "6fb3f385-2286-4fa7-b4e0-e31710c54b34",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2024-03-07 09:26:54,719 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials\n"
"2024-03-20 16:29:46,245 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials\n"
]
}
],
Expand All @@ -92,15 +92,15 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "c54dcf6e-cd28-43ce-b293-516b5431e15e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2024-03-07 09:26:57,241 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials\n"
"2024-03-20 16:29:47,616 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials\n"
]
}
],
Expand All @@ -111,12 +111,16 @@
" )\n",
"\n",
"occ_agg = occ_agg[occ_agg['clean_soc_name']!='Betting shop managers']\n",
"\n",
"# Lets not include so many of the occupations - it makes the plot laggy\n",
"occ_agg = occ_agg[occ_agg['num_job_ads']>500]\n",
"\n",
"occ_agg.reset_index(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "d7d2e90a-9796-4b3c-bb90-e5220fa41ffb",
"metadata": {},
"outputs": [],
Expand All @@ -129,7 +133,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"id": "7cf555e4-9b0b-4d67-991b-1366429c091e",
"metadata": {},
"outputs": [],
Expand All @@ -140,17 +144,17 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"id": "fdf6f78e-7fee-46f1-ac24-2d0e1b02ccbf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1346"
"1329"
]
},
"execution_count": 10,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -169,7 +173,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 9,
"id": "7e53c724-b55f-488a-bce2-81d2dd411384",
"metadata": {},
"outputs": [],
Expand All @@ -179,7 +183,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 10,
"id": "ada38051-e143-4ac3-9331-0341d28cde9b",
"metadata": {},
"outputs": [],
Expand All @@ -195,23 +199,22 @@
" collated_df = pd.concat([collated_df, occ_sim_details_df])\n",
"\n",
"# And only use data with an ok similarity\n",
"collated_df = collated_df[collated_df['Occupation_num_jobs_ads']>50]\n",
"collated_df = collated_df[collated_df['similarity']>0.75]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 11,
"id": "0960bbd8-cee9-402f-95b6-25065b8f705b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3931"
"3799"
]
},
"execution_count": 27,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -222,7 +225,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 12,
"id": "cc637aca-5a73-4e49-8e97-419c623e56a1",
"metadata": {},
"outputs": [],
Expand All @@ -241,15 +244,15 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 13,
"id": "5ab1a3ed-f6f8-4ef1-84fd-fab53ebd9edb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/elizabethgallagher/Code/dap_prinz_green_jobs/outputs/figures/green_jobs_explorer/240228 directory already exists\n"
"/Users/elizabethgallagher/Code/dap_prinz_green_jobs/outputs/figures/green_jobs_explorer/240320 directory already exists\n"
]
}
],
Expand All @@ -267,29 +270,39 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 14,
"id": "4fc7d9b4-0bc4-406f-a453-d4912fc4b910",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3931\n",
"3883\n"
"3799\n",
"2035\n"
]
}
],
"source": [
"print(len(collated_df))\n",
"collated_df = collated_df[collated_df['Occupation_num_jobs_ads']>=100]\n",
"collated_df = collated_df[collated_df['Occupation_num_jobs_ads']>2000]\n",
"collated_df.sort_values(by='Occupation', inplace=True)\n",
"print(len(collated_df))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 59,
"id": "addf6707-a10f-452b-9a57-f237a14e20ee",
"metadata": {},
"outputs": [],
"source": [
"collated_df['av_perc_green_skills_fixed'] = collated_df['av_perc_green_skills'].apply(lambda x: x if pd.notnull(x) else -1)"
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "221df4c9-6984-48e7-bb0f-4f15dcf12ce7",
"metadata": {},
"outputs": [],
Expand All @@ -303,12 +316,14 @@
" bind=select_box,\n",
")\n",
"\n",
"similar_skills_plot = alt.Chart(collated_df).mark_bar().encode(\n",
" x=alt.X('similarity', title='Skill similarity'),\n",
" y=alt.Y('SOC_2020_EXT_name_wrapped', sort='-x', title='', axis=alt.Axis(labelLimit=300)\n",
" ),\n",
"similar_skills_plot = alt.Chart(collated_df,\n",
" # padding={\"left\": 10, \"top\": 100, \"right\": 10, \"bottom\": 10}\n",
" ).mark_bar().encode(\n",
" x=alt.X('similarity', title='Skill similarity', scale=alt.Scale(domain=[0, 1])),\n",
" y=alt.Y('SOC_2020_EXT_name', sort='-x', title='', axis=alt.Axis(labelLimit=300)\n",
" ), \n",
" color=alt.Color(\n",
" 'av_perc_green_skills',\n",
" 'av_perc_green_skills_fixed',\n",
" title=[\"Average percentage\", \"of green skills\"],\n",
" scale=alt.Scale(\n",
" scheme='goldgreen',\n",
Expand All @@ -318,7 +333,6 @@
" ),\n",
" legend=None,\n",
" ),\n",
" \n",
" tooltip=[\n",
" alt.Tooltip(\"SOC_2020_EXT_name\", title=\"Occupation\"),\n",
" alt.Tooltip(\"similarity\", title=\"Similarity score\", format=\".2\"),\n",
Expand All @@ -332,7 +346,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 86,
"id": "9cf93f61-45f5-4f59-ab5f-dda4f316cf7f",
"metadata": {},
"outputs": [],
Expand All @@ -342,17 +356,27 @@
" chart_title=\"Most similar occupations based off skill similarity\",\n",
" fontsize_normal=16,\n",
" fontsize_title=18,\n",
").save(\n",
").configure_title(offset=100).save(\n",
" f\"{graph_dir}/similar_skills_plot.html\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 87,
"id": "274fe9ac-fdc2-4a1f-80f9-58d9bb27f306",
"metadata": {},
"outputs": [],
"source": [
"# configure_title will allow us to put the dropdown above the plot (done later in the html)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd04147e-4fb5-4cef-9021-83fe8207372b",
"metadata": {},
"outputs": [],
"source": []
}
],
Expand Down
Loading

0 comments on commit bde99e4

Please sign in to comment.