-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to disable the Achilles cache #2034
Comments
From discussion with @chrisknoll and @alex-odysseus this morning, there are a few questions we'd like to answer about the current (v2.11) behavior of the Achilles cache:
@chrisknoll @alex-odysseus: If I missed any questions, please post them here so we can keep track and document the current behavior. This will help decide how to proceed for v2.12. |
Sergey, please chime in @ssuvorov-fls |
|
So in an environment with a large # of data sources (let's say 40), we'd want to adjust the "cache.jobs.count" to a value of 8 which would then spawn 5 jobs (40/8 == 5) which would copy over the data? Considering the default job queue length of 10, we would still have 5 worker jobs available to service cohort generations, etc. Do I have this correct?
This is good to know from an operations perspective.
It would be ideal if this were only performed once - presumably Achilles data is static unless a CDM is refreshed?
Also good to know from an operations perspective as people may want to purge this data or ideally the deletion of the source from ATLAS kicks off a process to remove those cache results. |
Idea per @chrisknoll: Could we extend the source_daimon table to include a field for "caching" int field. The default is 0 and where we want to enable the cache, we set it for 1. |
Note: I'd put this at the 'source' level (not sourceDaimon because I think the caching applies to the entire source and not the individual daimon level (ie: CDM, RESULTS, VOCABULARY, etc). Maybe it might make sense to do it at the daimon level later, but for now I think a switch at the entire source level is a good first step. |
I have a branch for implementing is_cache_enabled on a source, and the warmCache function now filters by Vocabulary Daimon + results Daimon + isCachgeEnamed == true, however, I'm concerned about what happens when you go to an achilles report on a source that doesn't have cache enabled: @ssuvorov-fls : I think you mentioned in other thread that even when cache is disabled, it will read from the cache. Does that mean that if cache is not enabled, the webapi cache table will be empty, and the code always will read from the empty cache table? I would expect (and could try to implement) that when the cache is disabled, it should just read from the CDM results table directly....can you confirm the behavior, and if it always reads from the cache table even when cache is disabled, can you propose where to check if caching is enabled so that we read from the results table? |
@chrisknoll the main differences between warming the cdm cache and caching cdm results during request is speed - caching of the whole cdm is much faster |
Understood. Thank you, @ssuvorov-fls. It sounds like there is a case in the code that if the cache is empty it will fetch the results from the CDM. I think you are saying that if caching is disabled on a source, it will find the cache empty and fetch from the source. The only issue is that it will then put it into the cache and return to user, when I'd want it to just return the data to the user. But for the first implementation, I believe it will work that we can turn on/off sources for caching, and then let the fetching of the cdm results fall back to pulling from the cdm source. Let me know if I have anything wrong here. The goal is to let the webapi start up with a minimum set of cached sources (ie frequently used ones can be cached=true) but for the other less-used cdms can cache later but won't hold up the startup of webapi to do it. |
@chrisknoll |
fixes #2034 Set cdm.cache.cron.warming.enable = false by default Made achilles_result_concept_count a required table. Modified ddl population for record count table Records are cached from achilles_result_concept_count instead of achilles_results. Co-authored-by: Chris Knoll <[email protected]>
Per #2032, we'd like the ability to: 1) disable the Achilles cache warming process and 2) control it using the priority of the
results
daimon. Also relates to discussion on #2031.The text was updated successfully, but these errors were encountered: