Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facets: wrong count on multiple tkeywords filtering #11652

Closed
giohappy opened this issue Nov 2, 2023 · 0 comments · Fixed by #11655
Closed

Facets: wrong count on multiple tkeywords filtering #11652

giohappy opened this issue Nov 2, 2023 · 0 comments · Fixed by #11655
Assignees
Labels
Milestone

Comments

@giohappy
Copy link
Contributor

giohappy commented Nov 2, 2023

This problem was verified both for Thesauris keywords, and free text keywords.
The SQL queries that are generated when pre-filtering these facets contain repeated INNER JOINs which causes a (cartesian) multiplicative effect on the returned counts per item.
It looks like the filters and annotations added by the Provider.get_facet_items() to the queryset are causing the issue. The problem is clearly visible in the following video, where counts double when multiple keys are selected.

facets_doubled_counts

Here below three examples for Tehsauris Keywords.

Ex.1 - Facet without prefiltering query params.

http://localhost:8000/api/v2/facets/3-2-4-1-gemet-inspire-themes-rdf?page=0&pageSize=20

Only one INNER JOIN with resource_base

select
	"base_thesauruskeyword"."image",
	"base_resourcebase_tkeywords"."thesauruskeyword_id",
	"base_thesauruskeyword"."alt_label",
	COUNT("base_resourcebase_tkeywords"."thesauruskeyword_id") as "count",
	(
	select
		U0."label"
	from
		"base_thesauruskeywordlabel" U0
	where
		(U0."keyword_id" = "base_resourcebase_tkeywords"."thesauruskeyword_id"
			and U0."lang" = 'en')) as "localized_label"
from
	"base_resourcebase"
inner join "base_resourcebase_tkeywords" on
	("base_resourcebase"."id" = "base_resourcebase_tkeywords"."resourcebase_id")
inner join "base_thesauruskeyword" on
	("base_resourcebase_tkeywords"."thesauruskeyword_id" = "base_thesauruskeyword"."id")
inner join "base_thesaurus" on
	("base_thesauruskeyword"."thesaurus_id" = "base_thesaurus"."id")
where
	(not "base_resourcebase"."dirty_state"
		and not "base_resourcebase"."metadata_only"
		and "base_thesaurus"."identifier" = '3-2-4-1-gemet-inspire-themes-rdf')
group by
	"base_thesauruskeyword"."image",
	"base_resourcebase_tkeywords"."thesauruskeyword_id",
	"base_thesauruskeyword"."alt_label"
order by
	"count" desc

Output:

NULL	ac	3	Atmospheric conditions
NULL	af	1	Agricultural and aquaculture facilities

Ex. 2: Facet with pre-filtering, single keyword

http://localhost:8000/api/v2/facets/3-2-4-1-gemet-inspire-themes-rdf?filter{tkeywords}=16&page=0&pageSize=20

Multiple INNER JOINs are generated.

select
	distinct T5."image",
	T4."thesauruskeyword_id",
	T5."alt_label",
	COUNT(T4."thesauruskeyword_id") as "count",
	(
	select
		U0."label"
	from
		"base_thesauruskeywordlabel" U0
	where
		(U0."keyword_id" = T4."thesauruskeyword_id"
			and U0."lang" = 'en')) as "localized_label"
from
	"base_resourcebase"
inner join "base_resourcebase_tkeywords" on
	("base_resourcebase"."id" = "base_resourcebase_tkeywords"."resourcebase_id")
inner join "base_resourcebase_tkeywords" T4 on
	("base_resourcebase"."id" = T4."resourcebase_id")
inner join "base_thesauruskeyword" T5 on
	(T4."thesauruskeyword_id" = T5."id")
inner join "base_thesaurus" on
	(T5."thesaurus_id" = "base_thesaurus"."id")
where
	("base_resourcebase_tkeywords"."thesauruskeyword_id" = 16
		and not "base_resourcebase"."dirty_state"
		and not "base_resourcebase"."dirty_state"
		and not "base_resourcebase"."metadata_only"
		and "base_thesaurus"."identifier" = '3-2-4-1-gemet-inspire-themes-rdf')
group by
	T5."image",
	T4."thesauruskeyword_id",
	T5."alt_label"
order by
	"count" desc

Result is still ok, because we have only one the cartesian multiplication with keyword 16:

NULL	16	ac	3	Atmospheric conditions
NULL	14	af	1	Agricultural and aquaculture facilities

Ex. 3: Facet with pre-filtering, multiple keywords

http://localhost:8000/api/v2/facets/3-2-4-1-gemet-inspire-themes-rdf?filter{tkeywords}=16&filter{tkeywords}=14&include_topics=true&key=16&key=14

select
	distinct T5."image",
	T4."thesauruskeyword_id",
	T5."alt_label",
	COUNT(T4."thesauruskeyword_id") as "count",
	(
	select
		U0."label"
	from
		"base_thesauruskeywordlabel" U0
	where
		(U0."keyword_id" = T4."thesauruskeyword_id"
			and U0."lang" = 'en')) as "localized_label"
from
	"base_resourcebase"
inner join "base_resourcebase_tkeywords" on
	("base_resourcebase"."id" = "base_resourcebase_tkeywords"."resourcebase_id")
inner join "base_resourcebase_tkeywords" T4 on
	("base_resourcebase"."id" = T4."resourcebase_id")
inner join "base_thesauruskeyword" T5 on
	(T4."thesauruskeyword_id" = T5."id")
inner join "base_thesaurus" on
	(T5."thesaurus_id" = "base_thesaurus"."id")
where
	("base_resourcebase_tkeywords"."thesauruskeyword_id" in (14, 16)
		and not "base_resourcebase"."dirty_state"
		and not "base_resourcebase"."dirty_state"
		and not "base_resourcebase"."metadata_only"
		and T4."thesauruskeyword_id" in (14, 16)
			and "base_thesaurus"."identifier" = '3-2-4-1-gemet-inspire-themes-rdf')
group by
	T5."image",
	T4."thesauruskeyword_id",
	T5."alt_label"
order by
	"count" desc

Result is wrong, because of cartesian multiplication.

NULL	16	ac	4	Atmospheric conditions
NULL	14	af	2	Agricultural and aquaculture facilities

The effect accumulates if additional keywords are added to the prefiltering, as it can be seen in the example below (custom GeoNode)

facets_doubled_counts_2

@giohappy giohappy added the bug label Nov 2, 2023
@giohappy giohappy added this to the 4.2.0 milestone Nov 2, 2023
@etj etj changed the title Wrong SQL query count generated when pre-filtering facets Facets: wrong count on multiple tkeywords filtering Nov 3, 2023
@etj etj linked a pull request Nov 3, 2023 that will close this issue
12 tasks
giohappy pushed a commit that referenced this issue Nov 6, 2023
…11655)

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - tkeywords

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - category

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - region

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - keywords
giohappy pushed a commit that referenced this issue Nov 6, 2023
…11655)

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - tkeywords

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - category

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - region

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - keywords
giohappy added a commit that referenced this issue Nov 6, 2023
* thesaurus topics should only return its own keys (#11644)

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering (#11655)

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - tkeywords

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - category

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - region

* [Fixes #11652] Facets: wrong count on multiple tkeywords filtering - keywords

---------

Co-authored-by: Emanuele Tajariol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants