Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed SPARQL query in Count Lang Concepts #848

Merged
merged 1 commit into from
Feb 13, 2019
Merged

Conversation

aturbati
Copy link
Contributor

Better performance when counting the languages for all concepts by optimizing the SPARQL query

Better performance when counting the languages for all concepts by optimizing the SPARQL query
@osma
Copy link
Member

osma commented Feb 13, 2019

Thanks for the PR @aturbati ! This is very welcome, since the counting is currently slow. Can you explain a bit what the query does (e.g. show the full query) and how it performs on for example AGROVOC or some other large vocabulary, compared to the old one?

@osma osma added this to the 2.2 milestone Feb 13, 2019
@aturbati
Copy link
Contributor Author

The new query is this one:
SELECT ?lang ?prop (COUNT(?label) as ?count)
WHERE {
GRAPH http://aims.fao.org/aos/agrovoc/ {
VALUES (?type) { (http://www.w3.org/2004/02/skos/core#Concept) }
VALUES (?prop) { (skos:prefLabel) (skos:altLabel) (skos:hiddenLabel) }
?conc a ?type .
?conc ?prop ?label .
BIND(lang(?label) AS ?lang)
FILTER(?lang = 'ar' || ?lang = 'my' || ?lang = 'zh' || ?lang = 'cs' || ?lang = 'nl' || ?lang = 'en' || ?lang = 'fi' || ?lang = 'fr' || ?lang = 'ka' || ?lang = 'de' || ?lang = 'hi' || ?lang = 'hu' || ?lang = 'it' || ?lang = 'ja' || ?lang = 'km' || ?lang = 'ko' || ?lang = 'lo' || ?lang = 'la' || ?lang = 'ms' || ?lang = 'no' || ?lang = 'fa' || ?lang = 'pl' || ?lang = 'pt' || ?lang = 'ro' || ?lang = 'ru' || ?lang = 'sk' || ?lang = 'es' || ?lang = 'sv' || ?lang = 'te' || ?lang = 'th' || ?lang = 'tr' || ?lang = 'uk' || ?lang = 'vi')
}
}
GROUP BY ?lang ?prop ?type

The real main difference with the old query is that now the check on the languages is done directly via a BIND/FILTER and not using the VALUES .
On AGROVOC this query takes about less than 10 seconds to be executed, while the old one took more than 5 minutes (a lot of time the triple store stopped it for performances issue)

@osma
Copy link
Member

osma commented Feb 13, 2019

Sounds great! I will give it a try and if there are no problems this can be merged for 2.2.
The PHP code for generating the FILTER condition looks a bit rough and probably needs a cleanup (e.g. using implode instead of concatenating strings) but that's not a big issue.

@osma osma merged commit 085a9dc into NatLibFi:master Feb 13, 2019
@osma
Copy link
Member

osma commented Feb 13, 2019

I ended up heavily rewriting both the SPARQL query and the PHP code to generate it in subsequent commits, but the new SPARQL query still avoids the VALUES for languages and instead uses FILTER, like this (for YSO which has "only" three languages):

SELECT ?lang ?prop
  (COUNT(?label) as ?count)
WHERE {
  GRAPH <http://www.yso.fi/onto/yso/> {
    VALUES (?type) { (<http://www.w3.org/2004/02/skos/core#Concept>) }
    VALUES (?prop) { (skos:prefLabel) (skos:altLabel) (skos:hiddenLabel) }
    ?conc a ?type .
    ?conc ?prop ?label .
    BIND(LANG(?label) AS ?lang)
    FILTER(?lang IN ('en','sv','fi'))
  }
}
GROUP BY ?lang ?prop ?type

The performance improvement for YSO is not as dramatic as with AGROVOC - from 3 seconds to around 1.8 seconds - but still substantial.

Thanks a lot for suggesting this change!

@aturbati
Copy link
Contributor Author

Ok, I'm happy this helps making SKOSMOS a more reactive tool ( which needs less resources from the SPARQL endpoint and now there are no more problems with showing AGROVOC in SKOSMOS )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants