Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asynchronous indexing? #10

Open
bbarker opened this issue Feb 15, 2019 · 4 comments
Open

Add asynchronous indexing? #10

bbarker opened this issue Feb 15, 2019 · 4 comments

Comments

@bbarker
Copy link
Contributor

bbarker commented Feb 15, 2019

In Data.TCache.IndexQuery, index causes any new DBRefs to be indexed immediately. In some cases, it maybe better to have an alternative to index, say, eventuallyIndex, where another thread is kicked off that handles indexing, and writes a log of records that still need to be indexed (could just be a duplicate of the actual record, or at least the key of it, as this might be fastest).

If the application or server is halted before indexing is completed, the next time eventuallyIndex is run, it will kick off the thread again and resume working on any logs, if they exist.

This could help where performance is important or a limiting factor; I suspect (though am not sure) that bulk indexing could potentially be significantly faster than the current one-at-a-time method.

Just an idea - maybe there are better ways to do it.

@agocorona
Copy link
Owner

agocorona commented Feb 17, 2019

The problem with that approach is that it could produce erroneous results in queries: imagine that you ask for registers for which a field has a certain value and some register has not been already indexed. The result would be not accurate. for this reason indexation happen in the same atomic block that updates the register. An alternative approach is to defer the indexes, but force all index updates before making any query. This would need some thread synchronization which blocks any further updates during the query or perform the query optimistically and invalidate it in case more index updates where on the fly at that moment so that the query should be performed again etc . This is equivalent to reimplement STM at a different level. I think that investigating if the slowdown is worth the pain of the change. Anyway I think that laziness help since it defer some operations until they are needed. Note also that the indexing is in the user space memory, so it should be way faster than in normal databases.

@bbarker
Copy link
Contributor Author

bbarker commented Feb 18, 2019

The problem with that approach is that it could produce erroneous results in queries: imagine that you ask for registers for which a field has a certain value and some register has not been already indexed. The result would be not accurate. for this reason indexation happen in the same atomic block that updates the register.

Good point. I was thinking that the user of the library, in this case, would explicitly be committing to being eventually consistent, rather than always consistent.

An alternative approach is to defer the indexes, but force all index updates before making any query. This would need some thread synchronization which blocks any further updates during the query or perform the query optimistically and invalidate it in case more index updates where on the fly at that moment so that the query should be performed again etc . This is equivalent to reimplement STM at a different level.

This made me wonder (and I took a quick look but am new to STM), what is the call in index that causes it to block?

I think that investigating if the slowdown is worth the pain of the change. Anyway I think that laziness help since it defer some operations until they are needed. Note also that the indexing is in the user space memory, so it should be way faster than in normal databases.

I'll see if I can come up with a benchmark example; it seemed to be a deal breaker for me at the time, but I was doing everything with the default back-end (file-based). I will be working on switching to Cassandra very soon but am unsure of how this effects the stored index as yet.

@agocorona
Copy link
Owner

agocorona commented Feb 18, 2019

Good point. I was thinking that the user of the library, in this case, would explicitly be committing to being eventually consistent, rather than always consistent.

In synthesys I can make this optional, so in the setup, the programmer can specify the kind of consistency that he want. I have to look at the details first and think about it.

This made me wonder (and I took a quick look but am new to STM), what is the call in index that causes it to block?

No, i meant that if asynchronous indexing AND consistency is required then something like a re-implementation of STM over STM would be necessary. But never mind.

I'll see if I can come up with a benchmark example; it seemed to be a deal breaker for me at the time, but I was doing everything with the default back-end (file-based). I will be working on switching to Cassandra very soon but am unsure of how this effects the stored index as yet.

Cassandra would not suppose a change other than slower or faster indexing since it happens on memory

By the way, I'm working in a distributed database based on TCache.

@bbarker
Copy link
Contributor Author

bbarker commented Feb 26, 2019

Sorry for the delay in looking into this. I have confirmed one thing though. TCache does generate index files, e.g. for my SxRecord data type, I have:

.tcachedata/index-SxRecord*                                                                                                        
.tcachedata/index-SxRecordCowMark  .tcachedata/index-SxRecordSxId  .tcachedata/index-SxRecordUTCTime 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants