Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Synthese] Groupement des observations par maille lent #2766

Closed
mvergez opened this issue Oct 25, 2023 · 1 comment
Closed

[Synthese] Groupement des observations par maille lent #2766

mvergez opened this issue Oct 25, 2023 · 1 comment

Comments

@mvergez
Copy link
Contributor

mvergez commented Oct 25, 2023

Bonjour,

Dans le cadre d'une prestation pour l'Agence Régionale de la Biodiversité en île de France, il a été remonté que le groupement par maille ne fonctionnait pas dans la version 2.13.3 de GeoNature.

Problématique

En effet, la requête prend beaucoup trop de temps (plus de 100 secondes) et donc part en timeout.

La requête SQL est un peu longue donc je l'ai masquée :
WITH "OBS" AS (
SELECT
	json_build_object('id',
	gn_synthese.v_synthese_for_web_app.id_synthese,
	'date_min',
	gn_synthese.v_synthese_for_web_app.date_min,
	'lb_nom',
	gn_synthese.v_synthese_for_web_app.lb_nom,
	'cd_nom',
	gn_synthese.v_synthese_for_web_app.cd_nom,
	'observers',
	gn_synthese.v_synthese_for_web_app.observers,
	'dataset_name',
	gn_synthese.v_synthese_for_web_app.dataset_name,
	'url_source',
	gn_synthese.v_synthese_for_web_app.url_source,
	'unique_id_sinp',
	gn_synthese.v_synthese_for_web_app.unique_id_sinp,
	'nom_vern_or_lb_nom',
	COALESCE(NULLIF(gn_synthese.v_synthese_for_web_app.nom_vern,
	''),
	gn_synthese.v_synthese_for_web_app.lb_nom),
	'count_min_max',
	CASE
		WHEN (gn_synthese.v_synthese_for_web_app.count_min != gn_synthese.v_synthese_for_web_app.count_max) THEN concat(gn_synthese.v_synthese_for_web_app.count_min,
		' - ',
		gn_synthese.v_synthese_for_web_app.count_max)
		WHEN (gn_synthese.v_synthese_for_web_app.count_min IS NOT NULL) THEN concat(gn_synthese.v_synthese_for_web_app.count_min)
		ELSE ''
	END,
	'entity_source_pk_value',
	gn_synthese.v_synthese_for_web_app.entity_source_pk_value) AS obs_as_json,
	gn_synthese.v_synthese_for_web_app.id_synthese AS id_synthese
FROM
	gn_synthese.v_synthese_for_web_app
WHERE
	gn_synthese.v_synthese_for_web_app.the_geom_4326 IS NOT NULL
	AND (gn_synthese.v_synthese_for_web_app.id_nomenclature_sensitivity = 65
		OR gn_synthese.v_synthese_for_web_app.id_synthese IN (
		SELECT
			gn_synthese.cor_observer_synthese.id_synthese
		FROM
			gn_synthese.cor_observer_synthese
		WHERE
			gn_synthese.cor_observer_synthese.id_role = 1)
		OR gn_synthese.v_synthese_for_web_app.id_digitiser = 1)
ORDER BY
	gn_synthese.v_synthese_for_web_app.date_min DESC
LIMIT 2000),
"OBSERVATIONS" AS (
SELECT
	ref_geo.l_areas.geojson_4326 AS geojson,
	"OBS".obs_as_json AS obs_as_json
FROM
	"OBS"
LEFT OUTER JOIN LATERAL (
	SELECT
		gn_synthese.cor_area_synthese.id_synthese AS id_synthese,
		ref_geo.l_areas.id_area AS id_area
	FROM
		gn_synthese.v_synthese_for_web_app,
		gn_synthese.cor_area_synthese
	JOIN ref_geo.l_areas ON
		ref_geo.l_areas.id_area = gn_synthese.cor_area_synthese.id_area
	JOIN ref_geo.bib_areas_types ON
		ref_geo.bib_areas_types.id_type = ref_geo.l_areas.id_type
	WHERE
		gn_synthese.cor_area_synthese.id_synthese = gn_synthese.v_synthese_for_web_app.id_synthese
		AND ref_geo.bib_areas_types.type_code = 'M5') AS agg_areas ON
	agg_areas.id_synthese = "OBS".id_synthese
LEFT OUTER JOIN ref_geo.l_areas ON
	ref_geo.l_areas.id_area = agg_areas.id_area)
SELECT
	"OBSERVATIONS".geojson,
	json_build_object('observations',
	json_agg("OBSERVATIONS".obs_as_json)) AS json_build_object_1
FROM
	"OBSERVATIONS"
GROUP BY
	"OBSERVATIONS".geojson

Si on regarde un peu plus en détail, on s'aperçoit que cette sous-requête est également très lente :

SELECT
		gn_synthese.cor_area_synthese.id_synthese AS id_synthese,
		ref_geo.l_areas.id_area AS id_area
	FROM
		gn_synthese.v_synthese_for_web_app,
		gn_synthese.cor_area_synthese
	JOIN ref_geo.l_areas ON
		ref_geo.l_areas.id_area = gn_synthese.cor_area_synthese.id_area
	JOIN ref_geo.bib_areas_types ON
		ref_geo.bib_areas_types.id_type = ref_geo.l_areas.id_type
	WHERE
		gn_synthese.cor_area_synthese.id_synthese = gn_synthese.v_synthese_for_web_app.id_synthese
		AND ref_geo.bib_areas_types.type_code = 'M5') AS agg_areas

Car elle fait une requête sur toute la table cor_area_synthese qui fait quasi 5Go dans l'instance de l'ARB mais aussi sur gn_synthese.v_synthese_for_web_app.

Potentielle solution

Pensez-vous qu'il serait plutôt possible de faire la requête sur "OBS" directement ? Comme celle si est déjà exécutée on remonterait la visualisation maille de ces observations là, cela serait alors plus logique.

En modifiant :

.where(CorAreaSynthese.id_synthese == VSyntheseForWebApp.id_synthese)

par :

.where(CorAreaSynthese.id_synthese == obs_query.c.id_synthese)
La requête SQL devient :
WITH "OBS" AS (
SELECT
	json_build_object('id',
	gn_synthese.v_synthese_for_web_app.id_synthese,
	'date_min',
	gn_synthese.v_synthese_for_web_app.date_min,
	'lb_nom',
	gn_synthese.v_synthese_for_web_app.lb_nom,
	'cd_nom',
	gn_synthese.v_synthese_for_web_app.cd_nom,
	'observers',
	gn_synthese.v_synthese_for_web_app.observers,
	'dataset_name',
	gn_synthese.v_synthese_for_web_app.dataset_name,
	'url_source',
	gn_synthese.v_synthese_for_web_app.url_source,
	'unique_id_sinp',
	gn_synthese.v_synthese_for_web_app.unique_id_sinp,
	'nom_vern_or_lb_nom',
	COALESCE(NULLIF(gn_synthese.v_synthese_for_web_app.nom_vern,
	''),
	gn_synthese.v_synthese_for_web_app.lb_nom),
	'count_min_max',
	CASE
		WHEN (gn_synthese.v_synthese_for_web_app.count_min != gn_synthese.v_synthese_for_web_app.count_max) THEN concat(gn_synthese.v_synthese_for_web_app.count_min,
		' - ',
		gn_synthese.v_synthese_for_web_app.count_max)
		WHEN (gn_synthese.v_synthese_for_web_app.count_min IS NOT NULL) THEN concat(gn_synthese.v_synthese_for_web_app.count_min)
		ELSE ''
	END,
	'entity_source_pk_value',
	gn_synthese.v_synthese_for_web_app.entity_source_pk_value) AS obs_as_json,
	gn_synthese.v_synthese_for_web_app.id_synthese AS id_synthese
FROM
	gn_synthese.v_synthese_for_web_app
WHERE
	gn_synthese.v_synthese_for_web_app.the_geom_4326 IS NOT NULL
	AND (gn_synthese.v_synthese_for_web_app.id_nomenclature_sensitivity = 65
		OR gn_synthese.v_synthese_for_web_app.id_synthese IN (
		SELECT
			gn_synthese.cor_observer_synthese.id_synthese
		FROM
			gn_synthese.cor_observer_synthese
		WHERE
			gn_synthese.cor_observer_synthese.id_role = 1)
		OR gn_synthese.v_synthese_for_web_app.id_digitiser = 1)
ORDER BY
	gn_synthese.v_synthese_for_web_app.date_min DESC
LIMIT 2000),
"OBSERVATIONS" AS (
SELECT
	ref_geo.l_areas.geojson_4326 AS geojson,
	"OBS".obs_as_json AS obs_as_json
FROM
	"OBS"
LEFT OUTER JOIN LATERAL (
	SELECT
		gn_synthese.cor_area_synthese.id_synthese AS id_synthese,
		ref_geo.l_areas.id_area AS id_area
	FROM
		gn_synthese.cor_area_synthese
	JOIN ref_geo.l_areas ON
		ref_geo.l_areas.id_area = gn_synthese.cor_area_synthese.id_area
	JOIN ref_geo.bib_areas_types ON
		ref_geo.bib_areas_types.id_type = ref_geo.l_areas.id_type
	WHERE
		gn_synthese.cor_area_synthese.id_synthese = "OBS".id_synthese
		AND ref_geo.bib_areas_types.type_code = 'M5') AS agg_areas ON
	agg_areas.id_synthese = "OBS".id_synthese
LEFT OUTER JOIN ref_geo.l_areas ON
	ref_geo.l_areas.id_area = agg_areas.id_area)
SELECT
	"OBSERVATIONS".geojson,
	json_build_object('observations',
	json_agg("OBSERVATIONS".obs_as_json)) AS json_build_object_1
FROM
	"OBSERVATIONS"
GROUP BY
	"OBSERVATIONS".geojson

Il n'y a donc plus de FROM gn_synthese.v_synthese_for_web_app mais bien gn_synthese.cor_area_synthese.id_synthese = "OBS".id_synthese

Elle est aussi beaucoup plus performante !

Cela parait tout de même bizarre qu'on puisse faire un WHERE sans FROM "OBS" mais peut-être pourriez-vous m'éclairez là dessus ?

J'attends vos retours avec impatience !

@TheoLechemia
Copy link
Member

@mvergez j'arrive un peu après la bataille, mais les LATERAL JOIN permettent de faire reference à des FROM déclarés précedemment, contrairement au JOIN normal.
Je viens de voir ça en faisant des recherche sur la question :

Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.)

https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants