Skip to content

Commit

Permalink
added better variable names
Browse files Browse the repository at this point in the history
  • Loading branch information
micdavis committed Sep 12, 2022
1 parent 8d57e24 commit 426a4cb
Showing 1 changed file with 8 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,8 @@
" current_year_df = data.loc[data[\"work_year\"]==year]\n",
" current_year_df = current_year_df.drop(\"work_year\", axis=1)\n",
" individual_dataframes.append(current_year_df)\n",
"individual_dataframes[0]"
"year_2020_salary = individual_dataframes[0]\n",
"year_2022_salary = individual_dataframes[2]"
],
"metadata": {
"collapsed": false,
Expand Down Expand Up @@ -179,8 +180,8 @@
"profiler_options = dp.ProfilerOptions()\n",
"profiler_options.set({\"data_labeler.is_enabled\": False})\n",
"\n",
"profile = dp.Profiler(individual_dataframes[0], len(individual_dataframes[0]), options=profiler_options)\n",
"profile.save(filepath='previous_profile.pkl')\n",
"profile = dp.Profiler(year_2020_salary, len(year_2020_salary), options=profiler_options)\n",
"profile.save(filepath='year_2020_salary_profile.pkl')\n",
"report = profile.report(report_options={\"output_format\": \"compact\"})"
],
"metadata": {
Expand Down Expand Up @@ -219,7 +220,7 @@
{
"cell_type": "markdown",
"source": [
"The data owner has been told by his company that median salary increases were about **10000** from **2020** to **2022**. So the data owner wants to for their range around this number. With the validator below, an expectation is set up with a difference between the `median` `salary_in_usd` of at least **7500** and no more than **12500**. Meaning, this validator will check if the `median` `salary_in_usd` has increased by **7500** to **12500** from **2020 to **2022** across all the companies they have data on."
"The data owner has been told by his company that median salary increases were about **10000** from **2020** to **2022**. So the data owner wants to for their range around this number. With the validator below, an expectation is set up with a difference between the `median` `salary_in_usd` of at least **7500** and no more than **12500**. Meaning, this validator will check if the `median` `salary_in_usd` has increased by **7500** to **12500** from **2020** to **2022** across all the companies they have data on."
],
"metadata": {
"collapsed": false,
Expand All @@ -233,9 +234,9 @@
"execution_count": null,
"outputs": [],
"source": [
"validator = build_pandas_validator_with_data(individual_dataframes[1])\n",
"validator = build_pandas_validator_with_data(year_2022_salary)\n",
"results = validator.expect_profile_numeric_columns_diff_between_inclusive_threshold_range(\n",
" profile_path='previous_profile.pkl',\n",
" profile_path='year_2020_salary_profile.pkl',\n",
" limit_check_report_keys={\n",
" \"salary_in_usd\": {\n",
" \"median\": {\"lower\": 7500, \"upper\": 12500},\n",
Expand All @@ -254,7 +255,7 @@
"cell_type": "markdown",
"source": [
"### Results\n",
"From the output below, the data owner can see that the expectation has an unexpected value. The result shows that the diff between the two profiles is slightly less than **7000**. This indicates to the data owner that the salary growth trends in competing companies are still below their own company."
"From the output below, the data owner can see that the expectation has an unexpected value. The result shows that the diff between the two profiles is significantly more than the upperbound at about **44000**. This indicates to the data owner that the salary growth trends in competing companies have increased more than his own company in the last two years."
],
"metadata": {
"collapsed": false,
Expand Down

0 comments on commit 426a4cb

Please sign in to comment.