-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finisher: Set cached_*_count fields #370
Comments
@yolile I forget what we use these for. Are they just a quick "sense check" as to whether the collection succeeded? If so, we have other ways of determining that. |
The use case is described in #183, but maybe we now have faster queries to calculate these numbers? |
I ran: time psql "dbname=ocdskingfisherprocess user=jmckinney host=postgres-readonly.kingfisher.open-contracting.org sslmode=require" -c '\t' \
-c 'SELECT COUNT(*), collection_id FROM compiled_release GROUP BY collection_id' and it completes in 25s. Having run it once, it now completes in 5s (caching, I assume). I can do a JOIN (again, maybe caching is helping), and it returns in 7s. time psql "dbname=ocdskingfisherprocess user=jmckinney host=postgres-readonly.kingfisher.open-contracting.org sslmode=require" -c '\t' \
-c 'SELECT COUNT(*), collection_id FROM collection c INNER JOIN compiled_release cr ON c.id = cr.collection_id GROUP BY collection_id' Note that in a real-world scenario, you'd probably limit by At the time #183 was resolved, there was no These days, I think analysts mostly use https://kingfisher-colab.readthedocs.io/en/latest/#ocdskingfishercolab.list_collections, which returns the cached counts for all 3 tables (release, record, compiled_release). Having the cached counts offers efficiency in terms of:
I don't think (2) is that important. Seconds is fine for an operation that is performed rarely, e.g. once per feedback report. For (1), if the queries are being done by Furthermore, do analysts care about the numbers in |
Having the number of releases and records or compiled releases is useful for a quick check if the publisher implements change history or not (eg, if number of releases = number of compiled releases them there is no change history) |
Okay, I guess we probably want to keep these. They are pretty easy to implement in the finisher worker. |
In addition to what Yohanna mentioned before, it would be useful to easily review whether the publisher is including the same number of processes in its records, releases, and bulk publications |
Indeed - for now you'll have to run the alternative queries. |
Workaround:
The text was updated successfully, but these errors were encountered: