-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add asynchronous indexing? #10
Comments
The problem with that approach is that it could produce erroneous results in queries: imagine that you ask for registers for which a field has a certain value and some register has not been already indexed. The result would be not accurate. for this reason indexation happen in the same atomic block that updates the register. An alternative approach is to defer the indexes, but force all index updates before making any query. This would need some thread synchronization which blocks any further updates during the query or perform the query optimistically and invalidate it in case more index updates where on the fly at that moment so that the query should be performed again etc . This is equivalent to reimplement STM at a different level. I think that investigating if the slowdown is worth the pain of the change. Anyway I think that laziness help since it defer some operations until they are needed. Note also that the indexing is in the user space memory, so it should be way faster than in normal databases. |
Good point. I was thinking that the user of the library, in this case, would explicitly be committing to being eventually consistent, rather than always consistent.
This made me wonder (and I took a quick look but am new to STM), what is the call in
I'll see if I can come up with a benchmark example; it seemed to be a deal breaker for me at the time, but I was doing everything with the default back-end (file-based). I will be working on switching to Cassandra very soon but am unsure of how this effects the stored index as yet. |
In synthesys I can make this optional, so in the setup, the programmer can specify the kind of consistency that he want. I have to look at the details first and think about it.
No, i meant that if asynchronous indexing AND consistency is required then something like a re-implementation of STM over STM would be necessary. But never mind.
Cassandra would not suppose a change other than slower or faster indexing since it happens on memory By the way, I'm working in a distributed database based on TCache. |
Sorry for the delay in looking into this. I have confirmed one thing though. TCache does generate index files, e.g. for my
|
In
Data.TCache.IndexQuery
,index
causes any new DBRefs to be indexed immediately. In some cases, it maybe better to have an alternative toindex
, say,eventuallyIndex
, where another thread is kicked off that handles indexing, and writes a log of records that still need to be indexed (could just be a duplicate of the actual record, or at least the key of it, as this might be fastest).If the application or server is halted before indexing is completed, the next time
eventuallyIndex
is run, it will kick off the thread again and resume working on any logs, if they exist.This could help where performance is important or a limiting factor; I suspect (though am not sure) that bulk indexing could potentially be significantly faster than the current one-at-a-time method.
Just an idea - maybe there are better ways to do it.
The text was updated successfully, but these errors were encountered: