Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 energy: Get Eurostat data on energy prices #3499

Merged
merged 75 commits into from
Dec 4, 2024

Conversation

pabloarosado
Copy link
Contributor

@pabloarosado pabloarosado commented Nov 5, 2024

Main changes:

  • Add Eurostat data on gas and electricity prices in Europe.
  • Add Ember data on wholesale electricity prices in Europe.
  • Add IEA data on fossil fuel subsidies.
  • Create an mdim data page of energy prices, which depends on data://grapher steps (and doesn't read from DB).
  • Add a helper function to explode views in the multidim module.
  • Minor fix in PathFinder.

Review:
@lucasrodes and @Marigold, there's no need to review all steps (but feel free to). However, I'd appreciate your inputs on the following aspects:

  • Please have a look at the mdim data page and give feedback.
  • Please review the proposed alternative way to create mdim steps without accessing DB in etl/steps/export/multidim/energy/latest/energy_prices.py.
  • Please review the changes in etl/multidim.py.

Thank you very much!

TODO (for Pablo):

  • Clarify the meaning of "euros" and "PPS" in the metadata.
  • Complete remaining TODOs in the code.
  • Approve current drafts in chart diff (and keep them as drafts for now).

WARNING: This work is not yet finished, and will need to wait until Hannah is back. But it's probably better to merge what we have now to avoid future conflicts, and improve drafts and the energy prices mdim data page in a separate PR.

@owidbot
Copy link
Contributor

owidbot commented Nov 5, 2024

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-get-eurostat-data-on-energy-prices

chart-diff: ✅
  • 5/5 reviewed charts
  • Modified: 0/0
  • New: 5/5
  • Rejected: 0
data-diff: ❌ Found differences
= Dataset garden/antibiotics/2024-10-25/esvac_sales_corrected
  = Table esvac_sales_corrected
⚠ Error: Index must be unique.
= Dataset garden/artificial_intelligence/2023-06-14/ai_deepfakes
  = Table ai_deepfakes
⚠ Error: Index must be unique.
⚠ Error: Index must be unique.
= Dataset garden/artificial_intelligence/2024-02-15/epoch_llms
  = Table epoch_llms
    ~ Column dataset_size__tokens (changed metadata)
-       -       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epoch.ai. Retrieved from: 'https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^                               ^^^
+       +       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epochai.org. Retrieved from: 'https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^^^^                               ^^^^^^
-       -     url_main: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                            -
+       +     url_main: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                              ++++
-       -       url: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                         -
+       +       url: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                           ++++
    ~ Column mmlu_avg (changed metadata)
-       -       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epoch.ai. Retrieved from: 'https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^                               ^^^
+       +       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epochai.org. Retrieved from: 'https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^^^^                               ^^^^^^
-       -     url_main: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                            -
+       +     url_main: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                              ++++
-       -       url: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                         -
+       +       url: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                           ++++
    ~ Column model_size__parameters (changed metadata)
-       -       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epoch.ai. Retrieved from: 'https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^                               ^^^
+       +       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epochai.org. Retrieved from: 'https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^^^^                               ^^^^^^
-       -     url_main: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                            -
+       +     url_main: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                              ++++
-       -       url: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                         -
+       +       url: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                           ++++
    ~ Column organisation (changed metadata)
-       -       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epoch.ai. Retrieved from: 'https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^                               ^^^
+       +       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epochai.org. Retrieved from: 'https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^^^^                               ^^^^^^
-       -     url_main: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                            -
+       +     url_main: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                              ++++
-       -       url: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                         -
+       +       url: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                           ++++
    ~ Column training_computation_petaflop (changed metadata)
-       -       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epoch.ai. Retrieved from: 'https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^                               ^^^
+       +       Owen, David. (2023). Large Language Model performance and compute, Epoch (2023) [Data set]. In Extrapolating performance in language modeling benchmarks. Published online at epochai.org. Retrieved from: 'https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks' .
        ?                                                                                                                                                                                          ^^^^^^^                               ^^^^^^
-       -     url_main: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                            -
+       +     url_main: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                              ++++
-       -       url: https://epoch.ai/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                         -
+       +       url: https://epochai.org/blog/extrapolating-performance-in-language-modelling-benchmarks
        ?                           ++++
= Dataset garden/artificial_intelligence/2024-06-06/epoch_compute_cost
  = Table epoch_compute_cost
    ~ Column cost__inflation_adjusted (changed metadata)
-       -     url_main: https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models
        ?                            -
+       +     url_main: https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models
        ?                              ++++
    ~ Column domain (changed metadata)
-       -     url_main: https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models
        ?                            -
+       +     url_main: https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models
        ?                              ++++
    ~ Column publication_date (changed metadata)
-       -     url_main: https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models
        ?                            -
+       +     url_main: https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models
        ?                              ++++
= Dataset garden/artificial_intelligence/2024-11-03/epoch_aggregates_domain
  = Table epoch_aggregates_domain
    ~ Column cumulative_count (changed metadata)
-       -   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 03 November 2024.
        ?                                                                                                                                                                                                                                                            ^^
+       +   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 6 November 2024.
        ?                                                                                                                                                                                                                                                            ^
    ~ Column yearly_count (changed metadata)
-       -   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 03 November 2024.
        ?                                                                                                                                                                                                                                                            ^^
+       +   Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 6 November 2024.
        ?                                                                                                                                                                                                                                                            ^
= Dataset garden/artificial_intelligence/2024-11-03/epoch_compute_intensive_countries
  = Table epoch_compute_intensive_countries
    ~ Column cumulative_count (changed metadata)
-       -   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2024 data is incomplete and was last updated 03 November 2024.
        ?                                                                                                                                                                          ^^
+       +   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2024 data is incomplete and was last updated 6 November 2024.
        ?                                                                                                                                                                          ^
    ~ Column yearly_count (changed metadata)
-       -   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2024 data is incomplete and was last updated 03 November 2024.
        ?                                                                                                                                                                          ^^
+       +   Refers to the location of the primary organization with which the authors of a large-scale AI systems are affiliated. The 2024 data is incomplete and was last updated 6 November 2024.
        ?                                                                                                                                                                          ^
= Dataset garden/artificial_intelligence/2024-11-03/epoch_compute_intensive_domain
  = Table epoch_compute_intensive_domain
    ~ Column cumulative_count (changed metadata)
-       -   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2024 data is incomplete and was last updated 03 November 2024.
        ?                                                                                                                                                               ^^
+       +   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2024 data is incomplete and was last updated 6 November 2024.
        ?                                                                                                                                                               ^
    ~ Column yearly_count (changed metadata)
-       -   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2024 data is incomplete and was last updated 03 November 2024.
        ?                                                                                                                                                               ^^
+       +   Describes the specific area, application, or field in which a large-scale AI model is designed to operate. The 2024 data is incomplete and was last updated 6 November 2024.
        ?                                                                                                                                                               ^
+ Dataset garden/eurostat/2024-11-05/gas_and_electricity_prices
+ + Table gas_and_electricity_price_components_euro_flat
+   + Column electricity_household_capacity_taxes
+   + Column electricity_household_environmental_taxes
+   + Column electricity_household_network_costs
+   + Column electricity_household_nuclear_taxes
+   + Column electricity_household_other
+   + Column electricity_household_renewable_taxes
+   + Column electricity_household_taxes_fees_levies_and_charges
+   + Column electricity_household_total_price_including_taxes
+   + Column electricity_household_value_added_tax_vat
+   + Column electricity_non_household_capacity_taxes
+   + Column electricity_non_household_environmental_taxes
+   + Column electricity_non_household_network_costs
+   + Column electricity_non_household_nuclear_taxes
+   + Column electricity_non_household_other
+   + Column electricity_non_household_renewable_taxes
+   + Column electricity_non_household_taxes_fees_levies_and_charges
+   + Column electricity_non_household_total_price_including_taxes
+   + Column electricity_non_household_value_added_tax_vat
+   + Column electricity_household_capacity_taxes_allowances
+   + Column electricity_household_energy_and_supply
+   + Column electricity_household_environmental_taxes_allowance
+   + Column electricity_household_nuclear_taxes_allowance
+   + Column electricity_household_other_allowance
+   + Column electricity_household_renewable_taxes_allowance
+   + Column electricity_non_household_energy_and_supply
+   + Column gas_household_capacity_taxes
+   + Column gas_household_energy_and_supply
+   + Column gas_household_environmental_taxes
+   + Column gas_household_network_costs
+   + Column gas_household_other
+   + Column gas_household_renewable_taxes
+   + Column gas_household_taxes_fees_levies_and_charges
+   + Column gas_household_total_price_including_taxes
+   + Column gas_household_value_added_tax_vat
+   + Column gas_non_household_capacity_taxes
+   + Column gas_non_household_energy_and_supply
+   + Column gas_non_household_environmental_taxes
+   + Column gas_non_household_network_costs
+   + Column gas_non_household_other
+   + Column gas_non_household_renewable_taxes
+   + Column gas_non_household_taxes_fees_levies_and_charges
+   + Column gas_non_household_total_price_including_taxes
+   + Column gas_non_household_value_added_tax_vat
+   + Column electricity_household_taxes_fees_levies_and_charges_allowance
+ + Table gas_and_electricity_price_components_pps_flat
+   + Column electricity_household_capacity_taxes_pps
+   + Column electricity_household_environmental_taxes_pps
+   + Column electricity_household_network_costs_pps
+   + Column electricity_household_nuclear_taxes_pps
+   + Column electricity_household_other_pps
+   + Column electricity_household_renewable_taxes_pps
+   + Column electricity_household_taxes_fees_levies_and_charges_pps
+   + Column electricity_household_total_price_including_taxes_pps
+   + Column electricity_household_value_added_tax_vat_pps
+   + Column electricity_non_household_capacity_taxes_pps
+   + Column electricity_non_household_environmental_taxes_pps
+   + Column electricity_non_household_network_costs_pps
+   + Column electricity_non_household_nuclear_taxes_pps
+   + Column electricity_non_household_other_pps
+   + Column electricity_non_household_renewable_taxes_pps
+   + Column electricity_non_household_taxes_fees_levies_and_charges_pps
+   + Column electricity_non_household_total_price_including_taxes_pps
+   + Column electricity_non_household_value_added_tax_vat_pps
+   + Column electricity_household_capacity_taxes_allowances_pps
+   + Column electricity_household_energy_and_supply_pps
+   + Column electricity_household_environmental_taxes_allowance_pps
+   + Column electricity_household_nuclear_taxes_allowance_pps
+   + Column electricity_household_other_allowance_pps
+   + Column electricity_household_renewable_taxes_allowance_pps
+   + Column electricity_non_household_energy_and_supply_pps
+   + Column gas_household_capacity_taxes_pps
+   + Column gas_household_energy_and_supply_pps
+   + Column gas_household_environmental_taxes_pps
+   + Column gas_household_network_costs_pps
+   + Column gas_household_other_pps
+   + Column gas_household_renewable_taxes_pps
+   + Column gas_household_taxes_fees_levies_and_charges_pps
+   + Column gas_household_total_price_including_taxes_pps
+   + Column gas_household_value_added_tax_vat_pps
+   + Column gas_non_household_capacity_taxes_pps
+   + Column gas_non_household_energy_and_supply_pps
+   + Column gas_non_household_environmental_taxes_pps
+   + Column gas_non_household_network_costs_pps
+   + Column gas_non_household_other_pps
+   + Column gas_non_household_renewable_taxes_pps
+   + Column gas_non_household_taxes_fees_levies_and_charges_pps
+   + Column gas_non_household_total_price_including_taxes_pps
+   + Column gas_non_household_value_added_tax_vat_pps
+   + Column electricity_household_taxes_fees_levies_and_charges_allowance_pps
+ + Table gas_and_electricity_prices
+   + Column year
+   + Column price_euro
+   + Column price_pps
+ + Table gas_and_electricity_prices_euro_flat
+   + Column electricity_non_household_all_taxes_and_levies_included
+   + Column electricity_non_household_excluding_vat_and_other_recoverable_taxes_and_levies
+   + Column electricity_non_household_excluding_taxes_and_levies
+   + Column electricity_household_all_taxes_and_levies_included
+   + Column electricity_household_excluding_vat_and_other_recoverable_taxes_and_levies
+   + Column electricity_household_excluding_taxes_and_levies
+   + Column gas_non_household_all_taxes_and_levies_included
+   + Column gas_non_household_excluding_vat_and_other_recoverable_taxes_and_levies
+   + Column gas_non_household_excluding_taxes_and_levies
+   + Column gas_household_all_taxes_and_levies_included
+   + Column gas_household_excluding_vat_and_other_recoverable_taxes_and_levies
+   + Column gas_household_excluding_taxes_and_levies
+ + Table gas_and_electricity_prices_pps_flat
+   + Column electricity_non_household_all_taxes_and_levies_included_pps
+   + Column electricity_non_household_excluding_vat_and_other_recoverable_taxes_and_levies_pps
+   + Column electricity_non_household_excluding_taxes_and_levies_pps
+   + Column electricity_household_all_taxes_and_levies_included_pps
+   + Column electricity_household_excluding_vat_and_other_recoverable_taxes_and_levies_pps
+   + Column electricity_household_excluding_taxes_and_levies_pps
+   + Column gas_non_household_all_taxes_and_levies_included_pps
+   + Column gas_non_household_excluding_vat_and_other_recoverable_taxes_and_levies_pps
+   + Column gas_non_household_excluding_taxes_and_levies_pps
+   + Column gas_household_all_taxes_and_levies_included_pps
+   + Column gas_household_excluding_vat_and_other_recoverable_taxes_and_levies_pps
+   + Column gas_household_excluding_taxes_and_levies_pps
= Dataset garden/faostat/2024-03-14/faostat_fa
  = Table faostat_fa
  = Table faostat_fa_flat
2024-11-20 11:08:04 [warning  ] DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` category=PerformanceWarning filename=/home/owid/etl/lib/catalog/owid/catalog/tables.py lineno=405
2024-11-20 11:09:07 [warning  ] DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` category=PerformanceWarning filename=/home/owid/etl/lib/catalog/owid/catalog/tables.py lineno=405
2024-11-20 11:14:58 [warning  ] DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` category=PerformanceWarning filename=/home/owid/etl/lib/catalog/owid/catalog/tables.py lineno=405
2024-11-20 11:20:04 [warning  ] DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` category=PerformanceWarning filename=/home/owid/etl/lib/catalog/owid/catalog/tables.py lineno=405
= Dataset garden/lis/2024-06-13/luxembourg_income_study
  = Table luxembourg_income_study_adults
  = Table luxembourg_income_study
  = Table lis_percentiles
  = Table lis_percentiles_adults
⚠ Error: Index must be unique.
= Dataset garden/ophi/2024-10-28/multidimensional_poverty_index
  = Table multidimensional_poverty_index
    ~ Column censored_headcount_ratio (changed metadata)
-       - title: Share of population in multidimensional poverty deprived in the indicator <<indicator>> (<<area>>) - <<flavor>>
+       + title: Share of the population in multidimensional poverty deprived in the indicator <<indicator>> (<<area>>) - <<flavor>>
        ?                ++++
-       -   name: Share of population in multidimensional poverty deprived in the indicator <<indicator>>
+       +   name: Share of the population in multidimensional poverty deprived in the indicator <<indicator>>
        ?                 ++++
-       -   title_public: Share of population in multidimensional poverty deprived in the indicator <<indicator>>
+       +   title_public: Share of the population in multidimensional poverty deprived in the indicator <<indicator>>
        ?                         ++++
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
    ~ Column headcount_ratio (changed metadata)
-       - title: Share of population living in multidimensional poverty (<<area>>) - <<flavor>>
        ?                            ---  ----
+       + title: Share of the population in multidimensional poverty (<<area>>) - <<flavor>>
        ?                ++++
-       -     Each household is assessed against specific thresholds for these indicators. For example, a household is considered deprived in the _electricity_ indicator if it does not have access to it. [This article](https://ourworldindata.org/multidimensional-poverty-index) discusses specific thresholds in more detail.
+       +     Households are assessed as being deprived in a given indicator if they do not meet a specific threshold for that indicator. [This article](https://ourworldindata.org/multidimensional-poverty-index) explains the specific thresholds.
-       -     Each indicator contributes to one of the three dimensions of well-being.  Health and education indicators are weighted more (1/6 each) than living standards indicators (1/18 each) so that all three dimensions contribute equally to the overall measure.
+       +     The indicators vary in weight: health and education indicators weigh 1/6, while living standards indicators weigh 1/18, making each dimension contribute equally to one-third of the total.
-       -   name: Share of population living in multidimensional poverty
        ?                             ---  ----
+       +   name: Share of the population in multidimensional poverty
        ?                 ++++
-       -   title_public: Share of population living in multidimensional poverty
        ?                                     ---  ----
+       +   title_public: Share of the population in multidimensional poverty
        ?                         ++++
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
    ~ Column intensity (changed metadata)
-       -   Multidimensional poverty is defined as being deprived in a range of health, education and living standards indicators. The intensity is the share of indicators in which people in multidimensional poverty are deprived on average.<% if area == "Urban" %>
        ?                                                                                                                                                                            ^^^^^^^^^                 ^^^^^^^  ^^^^^^^^^^^^^^^
+       +   Multidimensional poverty is defined as being deprived in a range of health, education and living standards indicators. The intensity is the share of indicators in which the multidimensionally poor are deprived on average.<% if area == "Urban" %>
        ?                                                                                                                                                                            ^^^                 ^  ^^^^^^^^^^^^^^^^^^^^
-       -     Each household is assessed against specific thresholds for these indicators. For example, a household is considered deprived in the _electricity_ indicator if it does not have access to it. [This article](https://ourworldindata.org/multidimensional-poverty-index) discusses specific thresholds in more detail.
+       +     Households are assessed as being deprived in a given indicator if they do not meet a specific threshold for that indicator. [This article](https://ourworldindata.org/multidimensional-poverty-index) explains the specific thresholds.
-       -     Each indicator contributes to one of the three dimensions of well-being.  Health and education indicators are weighted more (1/6 each) than living standards indicators (1/18 each) so that all three dimensions contribute equally to the overall measure.
+       +     The indicators vary in weight: health and education indicators weigh 1/6, while living standards indicators weigh 1/18, making each dimension contribute equally to one-third of the total.
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
    ~ Column mpi (changed metadata)
-       -   Multidimensional poverty is defined as being deprived in a range of health, education and living standards indicators. The Multidimensional Poverty Index (MPI) is a measure that combines the prevalence and the intensity of multidimensional poverty on a scale from 0 to 1. Higher values indicate higher poverty.<% if area == "Urban" %>
        ?                                                                                                                              --------------------------------   ^^^^^^^^^^^^^^^^^^^                                                              --------------------------------------------------------------
+       +   Multidimensional poverty is defined as being deprived in a range of health, education and living standards indicators. The MPI is a measure that combines the prevalence and the intensity of multidimensional poverty.<% if area == "Urban" %>
        ?                                                                                                                                 ^^^^^^^^^^^^^^^^^^
-       -     Each household is assessed against specific thresholds for these indicators. For example, a household is considered deprived in the _electricity_ indicator if it does not have access to it. [This article](https://ourworldindata.org/multidimensional-poverty-index) discusses specific thresholds in more detail.
+       +     Households are assessed as being deprived in a given indicator if they do not meet a specific threshold for that indicator. [This article](https://ourworldindata.org/multidimensional-poverty-index) explains the specific thresholds.
-       -     Each indicator contributes to one of the three dimensions of well-being.  Health and education indicators are weighted more (1/6 each) than living standards indicators (1/18 each) so that all three dimensions contribute equally to the overall measure.
+       +     The indicators vary in weight: health and education indicators weigh 1/6, while living standards indicators weigh 1/18, making each dimension contribute equally to one-third of the total.
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
    ~ Column severe (changed metadata)
-       - title: Share of population living in severe multidimensional poverty (<<area>>) - <<flavor>>
        ?                            ---  ----
+       + title: Share of the population in severe multidimensional poverty (<<area>>) - <<flavor>>
        ?                ++++
-       -     Being in _severe_ multidimensional poverty means that a person lives in a household deprived in 50% or more of ten indicators, grouped into three dimensions of well-being: **health** (using two indicators: nutrition, child mortality), **education** (using two indicators: years of schooling, school attendance), and **living standards** (using five indicators: cooking fuel, sanitation, drinking water, electricity, housing, assets).
        ?           ---                              ^^^^^^^^^^^^^^^^^ ^^^^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+       +     Being _severely_ multidimensionally poor means that a person lives in a household deprived in 50% or more of ten indicators, grouped into three dimensions of well-being: **health** (using two indicators: nutrition, child mortality), **education** (using two indicators: years of schooling, school attendance), and **living standards** (using five indicators: cooking fuel, sanitation, drinking water, electricity, housing, assets).
        ?                  ++                  ++   +++++++++++++++++++++++++  ^^^^^^^^^^^^^^^^^^^ ^   ^^^^
-       -     Each household is assessed against specific thresholds for these indicators. For example, a household is considered deprived in the _electricity_ indicator if it does not have access to it. [This article](https://ourworldindata.org/multidimensional-poverty-index) discusses specific thresholds in more detail.
+       +     Households are assessed as being deprived in a given indicator if they do not meet a specific threshold for that indicator. [This article](https://ourworldindata.org/multidimensional-poverty-index) explains the specific thresholds.
-       -     Each indicator contributes to one of the three dimensions of well-being.  Health and education indicators are weighted more (1/6 each) than living standards indicators (1/18 each) so that all three dimensions contribute equally to the overall measure.
+       +     The indicators vary in weight: health and education indicators weigh 1/6, while living standards indicators weigh 1/18, making each dimension contribute equally to one-third of the total.
-       -   name: Share of population living in severe multidimensional poverty
        ?                             ---  ----
+       +   name: Share of the population in severe multidimensional poverty
        ?                 ++++
-       -   title_public: Share of population living in severe multidimensional poverty
        ?                                     ---  ----
+       +   title_public: Share of the population in severe multidimensional poverty
        ?                         ++++
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
    ~ Column uncensored_headcount_ratio (changed metadata)
-       - title: Share of population deprived in the indicator <<indicator>> (<<area>>) - <<flavor>>
+       + title: Share of the population deprived in the indicator <<indicator>> (<<area>>) - <<flavor>>
        ?                ++++
-       -   name: Share of population deprived in the indicator <<indicator>>
+       +   name: Share of the population deprived in the indicator <<indicator>>
        ?                 ++++
-       -   title_public: Share of population deprived in the indicator <<indicator>>
+       +   title_public: Share of the population deprived in the indicator <<indicator>>
        ?                         ++++
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
    ~ Column vulnerable (changed metadata)
-       - title: Share of population vulnerable to multidimensional poverty (<<area>>) - <<flavor>>
+       + title: Share of the population vulnerable to multidimensional poverty (<<area>>) - <<flavor>>
        ?                ++++
-       -     Being _vulnerable_ to multidimensional poverty means that a person lives in a household deprived in 20-33.3% of ten indicators, grouped into three dimensions of well-being: **health** (using two indicators: nutrition, child mortality), **education** (using two indicators: years of schooling, school attendance), and **living standards** (using five indicators: cooking fuel, sanitation, drinking water, electricity, housing, assets).
+       +     Being _vulnerable_ to multidimensional poverty means that a person lives in a household deprived in 20-33.33% of ten indicators, grouped into three dimensions of well-being: **health** (using two indicators: nutrition, child mortality), **education** (using two indicators: years of schooling, school attendance), and **living standards** (using five indicators: cooking fuel, sanitation, drinking water, electricity, housing, assets).
        ?                                                                                                                +
-       -     Each household is assessed against specific thresholds for these indicators. For example, a household is considered deprived in the _electricity_ indicator if it does not have access to it. [This article](https://ourworldindata.org/multidimensional-poverty-index) discusses specific thresholds in more detail.
+       +     Households are assessed as being deprived in a given indicator if they do not meet a specific threshold for that indicator. [This article](https://ourworldindata.org/multidimensional-poverty-index) explains the specific thresholds.
-       -     Each indicator contributes to one of the three dimensions of well-being.  Health and education indicators are weighted more (1/6 each) than living standards indicators (1/18 each) so that all three dimensions contribute equally to the overall measure.
+       +     The indicators vary in weight: health and education indicators weigh 1/6, while living standards indicators weigh 1/18, making each dimension contribute equally to one-third of the total.
-       -   name: Share of population vulnerable to multidimensional poverty
+       +   name: Share of the population vulnerable to multidimensional poverty
        ?                 ++++
-       -   title_public: Share of population vulnerable to multidimensional poverty
+       +   title_public: Share of the population vulnerable to multidimensional poverty
        ?                         ++++
-       -   faqs:
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-definition
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-sources
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-indicators-unavailable
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-comparability
-       -     - gdoc_id: 1gGburArxglFdHXeTLotFW4TOOLoeRq5XW6UfAdKtaAw
-       -       fragment_id: mpi-other-sources
= Dataset garden/regions/2023-01-01/regions
  = Table regions
    ~ Column aliases (changed data)
        ~ Changed values: 1 / 334 (0.30%)
          code                                                                                                                   aliases -                                                                                    aliases +
           CIV ["C\u00c3\u00b4te D'Ivoire", "C\u00f4te d'Ivoire", "C\u00f4te d\u2019Ivoire", "Ivory Coast", "C<U+00F4>te d<U+2019>Ivoire"] ["C\u00c3\u00b4te D'Ivoire", "C\u00f4te d'Ivoire", "C\u00f4te d\u2019Ivoire", "Ivory Coast"]
= Dataset garden/un/2022-07-11/un_wpp
  = Table fertility
  = Table demographic
  = Table un_wpp
  = Table migration
  = Table mortality
  = Table population
  = Table population_granular
    ~ Column value (changed data)
        ~ Changed values: 1391 / 39815008 (0.00%)
                  location  year    metric  sex age variant  value -  value +
                   Tokelau  2047 sex_ratio none  91    high     <NA>      inf
                   Tokelau  2046 sex_ratio none  91     low     <NA>      inf
            Western Sahara  1967 sex_ratio none  95  medium     <NA>      inf
          Falkland Islands  1957 sex_ratio none  89  medium     <NA>      inf
                   Tokelau  2033 sex_ratio none  78  medium     <NA>      inf
= Dataset garden/who/2024-09-09/flu_test
  = Table flu_test
    ~ Dim country
-       - Removed values: 10 / 72188 (0.01%)
                date   country
          2024-11-04 Hong Kong
          2024-11-04      Iran
          2024-11-04     Japan
          2024-11-04  Pakistan
          2024-11-04 Sri Lanka
    ~ Dim date
-       - Removed values: 10 / 72188 (0.01%)
            country       date
          Hong Kong 2024-11-04
               Iran 2024-11-04
              Japan 2024-11-04
           Pakistan 2024-11-04
          Sri Lanka 2024-11-04
    ~ Column denomcombined (changed data)
-       - Removed values: 10 / 72188 (0.01%)
            country       date  denomcombined
          Hong Kong 2024-11-04           6674
               Iran 2024-11-04           1822
              Japan 2024-11-04              5
           Pakistan 2024-11-04            347
          Sri Lanka 2024-11-04             76
        ~ Changed values: 21 / 72188 (0.03%)
            country       date  denomcombined -  denomcombined +
              Japan 2024-07-01               31               30
              Japan 2024-09-09               41               38
              Japan 2024-09-30               49               44
              Japan 2024-10-14               19               15
          Sri Lanka 2024-10-28               57               56
    ~ Column pcnt_poscombined (changed data)
-       - Removed values: 10 / 72188 (0.01%)
            country       date  pcnt_poscombined
          Hong Kong 2024-11-04          0.569374
               Iran 2024-11-04         10.153677
              Japan 2024-11-04              80.0
           Pakistan 2024-11-04          6.051873
          Sri Lanka 2024-11-04          9.210526
        ~ Changed values: 21 / 72188 (0.03%)
            country       date  pcnt_poscombined -  pcnt_poscombined +
              Japan 2024-07-01           38.709679           40.000000
              Japan 2024-09-09           65.853661           65.789474
              Japan 2024-09-30           75.510201           75.000000
              Japan 2024-10-14           73.684212           80.000000
          Sri Lanka 2024-10-28           14.035088           12.500000
= Dataset garden/worldbank_wdi/2022-05-26/wdi
  = Table wdi
    ~ Column omm_goods_exp_share_gdp (changed data)
        ~ Changed values: 1 / 14400 (0.01%)
          country  year  omm_goods_exp_share_gdp -  omm_goods_exp_share_gdp +
           Guyana  1977                  57.650002                  57.639999
    ~ Column omm_merch_exp_share_gdp (changed data)
        ~ Changed values: 1 / 14400 (0.01%)
           country  year  omm_merch_exp_share_gdp -  omm_merch_exp_share_gdp +
          Kiribati  1995                      12.43                      12.42
    ~ Column omm_net_savings_percap (changed data)
        ~ Changed values: 5 / 14400 (0.03%)
            country  year  omm_net_savings_percap -  omm_net_savings_percap +
            Albania  2008                 311.76001                311.750000
              Congo  2007               -328.380005               -328.369995
          Indonesia  2019                 592.23999                592.250000
             Israel  1970                243.539993                243.529999
             Panama  1995                694.280029                694.270020
= Dataset garden/worldbank_wdi/2024-05-20/wdi
  = Table wdi
    ~ Column omm_goods_exp_share_gdp (changed data)
        ~ Changed values: 4 / 14570 (0.03%)
              country  year  omm_goods_exp_share_gdp -  omm_goods_exp_share_gdp +
             Eswatini  2019                  44.349998                  44.360001
               Guyana  1977                  57.650002                  57.639999
            Singapore  2006                 188.800003                 188.789993
          Switzerland  2007                      42.73                  42.720001


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2024-11-29 14:16:22 UTC
Execution time: 4.81 seconds

@pabloarosado pabloarosado marked this pull request as ready for review November 29, 2024 14:04
@@ -0,0 +1,49 @@
from etl import multidim
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me personally, showing monthly data as default would make much more sense. Yearly data don't show 2024 and drop in prices in 2023. It took me a while to discover there is monthly frequency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with monthly data is that it only has one option: wholesale electricity prices. On the other hand, we have many more options for annual data. Using monthly as default would look like a very uninformative explorer, where most dropdowns have only one option.

unit: purchasing power standard
short_unit: PPS
description_short: Energy price in purchasing power standard.
gas_and_electricity_price_components_euro_flat:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how do you create all combinations in this file. I guess it's done programmatically? If yes, I think it would be useful to share the script or notebook in the same folder in case we have to make changes to it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't figured out a nice generic method for this. I usually use etl.helpers.print_tables_metadata_template, often tweaking it a bit. It's not a particularly sophisticated method... But once the yaml exists, I have total freedom to tweak things individually.

@@ -162,3 +167,74 @@ def fetch_variables_from_table(table: str, engine: Engine) -> pd.DataFrame:
df_dims = pd.DataFrame(dims, index=df.index)

return df.join(df_dims)


def generate_views_for_dimensions(
Copy link
Collaborator

@Marigold Marigold Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It'd be good to unify generate_views_for_dimensions and expand_views into a single function in the future. They both do pretty much the same thing - take a list of (flattened) indicator paths (be it from ETL or DB) and create views from it. expand_views has more options for aggregations and generate_views_for_dimensions better logging, but they are essentially the same.

(I'm happy to do it, though there's no rush.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd be in favour of harmonizing / consolidating functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally! I initially thought of harmonizing both functions. But then I thought they are quite different in what they do. If we are moving towards mdim steps that do not read from DB, maybe there's not much gain in adapting expand_views. So, until we decide on that, we can keep these two versions.

etl/multidim.py Outdated
log.warning(f"Combination {slug_combination} found in multiple tables: {relevant_table}")

# Construct the indicator path.
indicator_path = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - this won't work for long indicator paths, you'd have to use trim_long_variable_name on short_name (slug_combination).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! Fixed.

Copy link
Collaborator

@Marigold Marigold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'm glad you found an easy way how to make it work for flattened data from ETL. I'd be interested in talking about whether to flatten or not and how to handle combinations in metadata. It seems like there are pros and cons to both and it really depends on the dataset and its complexity.

Copy link
Member

@lucasrodes lucasrodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the mdim data page and give feedback.

MDIM page looks good to me!

Please review the proposed alternative way to create mdim steps without accessing DB in etl/steps/export/multidim/energy/latest/energy_prices.py.

Left some comments!

Please review the changes in etl/multidim.py.

Left some comments!


Other comments

Docs

Also, could you consolidate your changes with this docs? Note that, even if docs are changing constantly, is good to signal it with a warning. Or some signal for others to know how to even start with MDIMs.

Long / wide format

Would your solution also work with long-formatted tables?

@@ -162,3 +167,74 @@ def fetch_variables_from_table(table: str, engine: Engine) -> pd.DataFrame:
df_dims = pd.DataFrame(dims, index=df.index)

return df.join(df_dims)


def generate_views_for_dimensions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd be in favour of harmonizing / consolidating functions

etl/multidim.py Outdated Show resolved Hide resolved
@pabloarosado pabloarosado merged commit 3a36b22 into master Dec 4, 2024
8 checks passed
@pabloarosado pabloarosado deleted the get-eurostat-data-on-energy-prices branch December 4, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants