📊 energy: Get Eurostat data on energy prices (#3499)

* 📊 energy: Get Eurostat data on energy prices * Add snapshot, and create data steps skeleton * Fix missing dataset code * Prepare meadow step * Prepare garden step (WIP) * Harmonize country names and other improvements of garden step * Keep working on garden step, mostly mapping different fields * Map energy price components * Improve garden dataset * Adapt code to ignore historical data * Improve garden step * Working on garden step (still WIP) * Garden step (WIP) * Improve garden step * Work on garden step and start grapher step * Prepare grapher step * Improve metadata * Impose that certain price components need to be informed * Add data from prices datasets, include checks, and improve metadata * Fix table name * Create another grapher step for prices and improve metadata * Improve grapher steps * Remove repeated step in the dag * Add documentation explaining findings about components * Add sanity checks and remove TODOs * Improve metadata * Add key descriptions to price component variables * Add key descriptions to prices variables * Add short descriptions * Add analysis comparing price components data and price data * Improve checks * Add total price to the components data * Improve metadata * Improve metadata * Improve metadata * Improve format * Add monthly wholesale electricity prices from Ember * Add IEA fossil fuel subsidies data (WIP) * Adapt meadow step * Adapt garden and grapher steps * Adapt garden and grapher steps * Include additional indicators from IEA * Add other IEA indicators and improve metadata * Fix old read_table * Fix export steps ignored by PathFinder * Create energy prices dataset and mdim explorer * Various improvements * Add missing pps data * Add price components views and other improvements * Add map tabs (not working properly, there might be a bug somewhere) * Simplify Eurostat steps * Delete Eurostat grapher steps and simplify * Refactor mdim step and add general function to multidim module * Complete to-do * Remove unnecessary grapher step * Homogenize prices * Remove to-do * Improve metadata * Trim long variable names * Simplify function that expands views, and add documentation * Update the docs
owid · Dec 4, 2024 · 3a36b22 · 3a36b22
1 parent 1018780
commit 3a36b22
Show file tree

Hide file tree

Showing 29 changed files with 3,506 additions and 74 deletions.
diff --git a/dag/energy.yml b/dag/energy.yml
@@ -228,3 +228,54 @@ steps:
   #
   data://grapher/energy/2024-11-15/photovoltaic_cost_and_capacity:
     - data://garden/energy/2024-11-15/photovoltaic_cost_and_capacity
+  #
+  # Eurostat - Energy statistics, prices of natural gas and electricity
+  #
+  data://meadow/eurostat/2024-11-05/gas_and_electricity_prices:
+    - snapshot://eurostat/2024-11-05/gas_and_electricity_prices.zip
+  #
+  # Eurostat - Energy statistics, prices of natural gas and electricity
+  #
+  data://garden/eurostat/2024-11-05/gas_and_electricity_prices:
+    - data://meadow/eurostat/2024-11-05/gas_and_electricity_prices
+  #
+  # Ember - European wholesale electricity prices
+  #
+  data://meadow/ember/2024-11-20/european_wholesale_electricity_prices:
+    - snapshot://ember/2024-11-20/european_wholesale_electricity_prices.csv
+  #
+  # Ember - European wholesale electricity prices
+  #
+  data://garden/ember/2024-11-20/european_wholesale_electricity_prices:
+    - data://meadow/ember/2024-11-20/european_wholesale_electricity_prices
+  #
+  # IEA - Fossil fuel subsidies
+  #
+  data://meadow/iea/2024-11-20/fossil_fuel_subsidies:
+    - snapshot://iea/2024-11-20/fossil_fuel_subsidies.xlsx
+  #
+  # IEA - Fossil fuel subsidies
+  #
+  data://garden/iea/2024-11-20/fossil_fuel_subsidies:
+    - data://meadow/iea/2024-11-20/fossil_fuel_subsidies
+  #
+  # IEA - Fossil fuel subsidies
+  #
+  data://grapher/iea/2024-11-20/fossil_fuel_subsidies:
+    - data://garden/iea/2024-11-20/fossil_fuel_subsidies
+  #
+  # Energy prices
+  #
+  data://garden/energy/2024-11-20/energy_prices:
+    - data://garden/eurostat/2024-11-05/gas_and_electricity_prices
+    - data://garden/ember/2024-11-20/european_wholesale_electricity_prices
+  #
+  # Energy prices
+  #
+  data://grapher/energy/2024-11-20/energy_prices:
+    - data://garden/energy/2024-11-20/energy_prices
+  #
+  # Energy prices explorer
+  #
+  export://multidim/energy/latest/energy_prices:
+    - data://grapher/energy/2024-11-20/energy_prices
diff --git a/docs/guides/data-work/export-data.md b/docs/guides/data-work/export-data.md
@@ -31,101 +31,93 @@ ds_explorer.save()
 
 Multi-dimensional indicators are powered by a configuration that is typically created from a YAML file. The structure of the YAML file looks like this:
 
-```yaml title="etl/steps/export/multidim/covid/latest/covid.deaths.yaml"
-definitions:
-  table: {definitions.table}
-
+```yaml title="etl/steps/export/multidim/energy/latest/energy_prices.yaml"
 title:
-  title: COVID-19 deaths
-  titleVariant: by interval
+  title: "Energy prices"
+  titleVariant: "by energy source"
 defaultSelection:
-  - World
-  - Europe
-  - Asia
+  - "European Union (27)"
 topicTags:
-  - COVID-19
-
+  - "Energy"
 dimensions:
-  - slug: interval
-    name: Interval
+  - slug: "frequency"
+    name: "Frequency"
     choices:
-      - slug: weekly
-        name: Weekly
-        description: null
-      - slug: biweekly
-        name: Biweekly
-        description: null
-
-  - slug: metric
-    name: Metric
+      - slug: "annual"
+        name: "Annual"
+        description: "Annual data"
+      - slug: "monthly"
+        name: "Monthly"
+        description: "Monthly data"
+  - slug: "source"
+    name: "Energy source"
     choices:
-      - slug: absolute
-        name: Absolute
-        description: null
-      - slug: per_capita
-        name: Per million people
-        description: null
-      - slug: change
-        name: Change from previous interval
-        description: null
-
+      - slug: "electricity"
+        name: "Electricity"
+      - slug: "gas"
+        name: "Gas"
+  - slug: "unit"
+    name: "Unit"
+    choices:
+      - slug: "euro"
+        name: "Euro"
+        description: "Price in euros"
+      - slug: "pps"
+        name: "PPS"
+        description: "Price in Purchasing Power Standard"
 views:
-  - dimensions:
-      interval: weekly
-      metric: absolute
-    indicators:
-      y: "{definitions.table}#weekly_deaths"
-  - dimensions:
-      interval: weekly
-      metric: per_capita
-    indicators:
-      y: "{definitions.table}#weekly_deaths_per_million"
-  - dimensions:
-      interval: weekly
-      metric: change
-    indicators:
-      y: "{definitions.table}#weekly_pct_growth_deaths"
-
-  - dimensions:
-      interval: biweekly
-      metric: absolute
-    indicators:
-      y: "{definitions.table}#biweekly_deaths"
-  - dimensions:
-      interval: biweekly
-      metric: per_capita
-    indicators:
-      y: "{definitions.table}#biweekly_deaths_per_million"
-  - dimensions:
-      interval: biweekly
-      metric: change
-    indicators:
-      y: "{definitions.table}#biweekly_pct_growth_deaths"
+  # Views will be filled out programmatically.
+  []
+
 ```
 
-The `dimensions` field specifies selectors, and the `views` field defines views for the selection. Since there are numerous possible configurations, `views` are usually generated programmatically. However, it's a good idea to create a few of them manually to start.
+The `dimensions` field specifies selectors, and the `views` field defines views for the selection. Since there are numerous possible configurations, `views` are usually generated programmatically (using function `etl.multidim.generate_views_for_dimensions`).
 
 You can also combine manually defined views with generated ones. See the `etl.multidim` module for available helper functions or refer to examples from `etl/steps/export/multidim/`. Feel free to add or modify the helper functions as needed.
 
-The export step loads the YAML file, adds `views` to the config, and then calls the function.
+The export step loads the data dependencies and the config YAML file, adds `views` to the config, and then pushes the configuration to the database.
 
-```python title="etl/steps/export/multidim/covid/latest/covid.py"
+```python title="etl/steps/export/multidim/energy/latest/energy_prices.py"
 def run(dest_dir: str) -> None:
-    engine = get_engine()
-
-    # Load YAML file
-    config = paths.load_mdim_config("covid.deaths.yaml")
+    #
+    # Load inputs.
+    #
+    # Load data on energy prices.
+    ds_grapher = paths.load_dataset("energy_prices")
+
+    # Read table of prices in euros.
+    tb_annual = ds_grapher.read("energy_prices_annual")
+    tb_monthly = ds_grapher.read("energy_prices_monthly")
+
+    #
+    # Process data.
+    #
+    # Load configuration from adjacent yaml file.
+    config = paths.load_mdim_config()
+
+    # Create views.
+    config["views"] = multidim.generate_views_for_dimensions(
+        dimensions=config["dimensions"],
+        tables=[tb_annual, tb_monthly],
+        dimensions_order_in_slug=("frequency", "source", "unit"),
+        warn_on_missing_combinations=False,
+        additional_config={"chartTypes": ["LineChart"], "hasMapTab": True, "tab": "map"},
+    )
+
+    #
+    # Save outputs.
+    #
+    multidim.upsert_multidim_data_page(slug="mdd-energy-prices", config=config, engine=get_engine())
 
-    multidim.upsert_multidim_data_page("mdd-energy", config, engine)
 ```
 
 To see the multi-dimensional indicator in Admin, run
 
 ```bash
-etlr export://multidim/energy/latest/energy --export
+etlr export://multidim/energy/latest/energy_prices --export
 ```
 
-and check out the preview at http://staging-site-my-branch/admin/grapher/mdd-name.
+and check out the preview at: http://staging-site-my-branch/admin/grapher/mdd-energy-prices
 
 
 ## Exporting data to GitHub

diff --git a/etl/helpers.py b/etl/helpers.py
@@ -594,7 +594,7 @@ def _get_attributes_from_step_name(step_name: str) -> Dict[str, str]:
         if channel_type.startswith(("walden", "snapshot")):
             channel = channel_type
             namespace, version, short_name = path.split("/")
-        elif channel_type.startswith(("data",)):
+        elif channel_type.startswith(("data", "export")):
             channel, namespace, version, short_name = path.split("/")
         else:
             raise WrongStepName

diff --git a/etl/multidim.py b/etl/multidim.py
@@ -1,14 +1,20 @@
 import json
+from itertools import product
 
 import pandas as pd
 import yaml
 from sqlalchemy.engine import Engine
+from structlog import get_logger
 
 from apps.chart_sync.admin_api import AdminAPI
 from etl.config import OWID_ENV
 from etl.db import read_sql
+from etl.grapher_io import trim_long_variable_name
 from etl.helpers import map_indicator_path_to_id
 
+# Initialize logger.
+log = get_logger()
+
 
 def upsert_multidim_data_page(slug: str, config: dict, engine: Engine) -> None:
     validate_multidim_config(config, engine)
@@ -162,3 +168,103 @@ def fetch_variables_from_table(table: str, engine: Engine) -> pd.DataFrame:
     df_dims = pd.DataFrame(dims, index=df.index)
 
     return df.join(df_dims)
+
+
+def generate_views_for_dimensions(
+    dimensions, tables, dimensions_order_in_slug=None, additional_config=None, warn_on_missing_combinations=True
+):
+    """Generate individual views for all possible combinations of dimensions in a list of flattened tables.
+
+    Parameters
+    ----------
+    dimensions : List[Dict[str, Any]]
+        Dimensions, as given in the configuration of the multidim step, e.g.
+        [
+            {'slug': 'frequency', 'name': 'Frequency', 'choices': [{'slug': 'annual','name': 'Annual'}, {'slug': 'monthly', 'name': 'Monthly'}]},
+            {'slug': 'source', 'name': 'Energy source', 'choices': [{'slug': 'electricity', 'name': 'Electricity'}, {'slug': 'gas', 'name': 'Gas'}]},
+            ...
+        ]
+    tables : List[Table]
+        Tables whose indicator views will be generated.
+    dimensions_order_in_slug : Tuple[str], optional
+        Dimension names, as they appear in "dimensions", and in the order in which they are spelled out in indicator names. For example, if indicator names are, e.g. annual_electricity_euros, then dimensions_order_in_slug would be ("frequency", "source", "unit").
+    additional_config : _type_, optional
+        Additional config fields to add to each view, e.g.
+        {"chartTypes": ["LineChart"], "hasMapTab": True, "tab": "map"}
+    warn_on_missing_combinations : bool, optional
+        True to warn if any combination of dimensions is not found among the indicators in the given tables.
+
+    Returns
+    -------
+    results : List[Dict[str, Any]]
+        Views configuration, e.g.
+        [
+            {'dimensions': {'frequency': 'annual', 'source': 'electricity', 'unit': 'euro'}, 'indicators': {'y': 'grapher/energy/2024-11-20/energy_prices/energy_prices_annual#annual_electricity_household_total_price_including_taxes_euro'},
+            {'dimensions': {'frequency': 'annual', 'source': 'electricity', 'unit': 'pps'}, 'indicators': {'y': 'grapher/energy/2024-11-20/energy_prices/energy_prices_annual#annual_electricity_household_total_price_including_taxes_pps'},
+            ...
+        ]
+
+    """
+    # Extract all choices for each dimension as (slug, choice_slug) pairs.
+    choices = {dim["slug"]: [choice["slug"] for choice in dim["choices"]] for dim in dimensions}
+    dimension_slugs_in_config = set(choices.keys())
+
+    # Sanity check for dimensions_order_in_slug.
+    if dimensions_order_in_slug:
+        dimension_slugs_in_order = set(dimensions_order_in_slug)
+
+        # Check if any slug in the order is missing from the config.
+        missing_slugs = dimension_slugs_in_order - dimension_slugs_in_config
+        if missing_slugs:
+            raise ValueError(
+                f"The following dimensions are in 'dimensions_order_in_slug' but not in the config: {missing_slugs}"
+            )
+
+        # Check if any slug in the config is missing from the order.
+        extra_slugs = dimension_slugs_in_config - dimension_slugs_in_order
+        if extra_slugs:
+            log.warning(
+                f"The following dimensions are in the config but not in 'dimensions_order_in_slug': {extra_slugs}"
+            )
+
+        # Reorder choices to match the specified order.
+        choices = {dim: choices[dim] for dim in dimensions_order_in_slug if dim in choices}
+
+    # Generate all combinations of the choices.
+    all_combinations = list(product(*choices.values()))
+
+    # Create the views.
+    results = []
+    for combination in all_combinations:
+        # Map dimension slugs to the chosen values.
+        dimension_mapping = {dim_slug: choice for dim_slug, choice in zip(choices.keys(), combination)}
+        slug_combination = "_".join(combination)
+
+        # Find relevant tables for the current combination.
+        relevant_table = []
+        for table in tables:
+            if slug_combination in table:
+                relevant_table.append(table)
+
+        # Handle missing or multiple table matches.
+        if len(relevant_table) == 0:
+            if warn_on_missing_combinations:
+                log.warning(f"Combination {slug_combination} not found in tables")
+            continue
+        elif len(relevant_table) > 1:
+            log.warning(f"Combination {slug_combination} found in multiple tables: {relevant_table}")
+
+        # Construct the indicator path.
+        indicator_path = f"{relevant_table[0].metadata.dataset.uri}/{relevant_table[0].metadata.short_name}#{trim_long_variable_name(slug_combination)}"
+        indicators = {
+            "y": indicator_path,
+        }
+        # Append the combination to results.
+        results.append({"dimensions": dimension_mapping, "indicators": indicators})
+
+    if additional_config:
+        # Include additional fields in all results.
+        for result in results:
+            result.update({"config": additional_config})
+
+    return results
diff --git a/etl/steps/data/garden/ember/2024-11-20/european_wholesale_electricity_prices.countries.json b/etl/steps/data/garden/ember/2024-11-20/european_wholesale_electricity_prices.countries.json
@@ -0,0 +1,31 @@
+{
+  "Austria": "Austria",
+  "Belgium": "Belgium",
+  "Bulgaria": "Bulgaria",
+  "Croatia": "Croatia",
+  "Czechia": "Czechia",
+  "Denmark": "Denmark",
+  "Estonia": "Estonia",
+  "Finland": "Finland",
+  "France": "France",
+  "Germany": "Germany",
+  "Greece": "Greece",
+  "Hungary": "Hungary",
+  "Ireland": "Ireland",
+  "Italy": "Italy",
+  "Latvia": "Latvia",
+  "Lithuania": "Lithuania",
+  "Luxembourg": "Luxembourg",
+  "Netherlands": "Netherlands",
+  "North Macedonia": "North Macedonia",
+  "Norway": "Norway",
+  "Poland": "Poland",
+  "Portugal": "Portugal",
+  "Romania": "Romania",
+  "Serbia": "Serbia",
+  "Slovakia": "Slovakia",
+  "Slovenia": "Slovenia",
+  "Spain": "Spain",
+  "Sweden": "Sweden",
+  "Switzerland": "Switzerland"
+}