Add asynchronous indexing? #10

bbarker · 2019-02-15T21:27:40Z

In Data.TCache.IndexQuery, index causes any new DBRefs to be indexed immediately. In some cases, it maybe better to have an alternative to index, say, eventuallyIndex, where another thread is kicked off that handles indexing, and writes a log of records that still need to be indexed (could just be a duplicate of the actual record, or at least the key of it, as this might be fastest).

If the application or server is halted before indexing is completed, the next time eventuallyIndex is run, it will kick off the thread again and resume working on any logs, if they exist.

This could help where performance is important or a limiting factor; I suspect (though am not sure) that bulk indexing could potentially be significantly faster than the current one-at-a-time method.

Just an idea - maybe there are better ways to do it.

The text was updated successfully, but these errors were encountered:

agocorona · 2019-02-17T22:57:52Z

The problem with that approach is that it could produce erroneous results in queries: imagine that you ask for registers for which a field has a certain value and some register has not been already indexed. The result would be not accurate. for this reason indexation happen in the same atomic block that updates the register. An alternative approach is to defer the indexes, but force all index updates before making any query. This would need some thread synchronization which blocks any further updates during the query or perform the query optimistically and invalidate it in case more index updates where on the fly at that moment so that the query should be performed again etc . This is equivalent to reimplement STM at a different level. I think that investigating if the slowdown is worth the pain of the change. Anyway I think that laziness help since it defer some operations until they are needed. Note also that the indexing is in the user space memory, so it should be way faster than in normal databases.

bbarker · 2019-02-18T01:20:48Z

The problem with that approach is that it could produce erroneous results in queries: imagine that you ask for registers for which a field has a certain value and some register has not been already indexed. The result would be not accurate. for this reason indexation happen in the same atomic block that updates the register.

Good point. I was thinking that the user of the library, in this case, would explicitly be committing to being eventually consistent, rather than always consistent.

An alternative approach is to defer the indexes, but force all index updates before making any query. This would need some thread synchronization which blocks any further updates during the query or perform the query optimistically and invalidate it in case more index updates where on the fly at that moment so that the query should be performed again etc . This is equivalent to reimplement STM at a different level.

This made me wonder (and I took a quick look but am new to STM), what is the call in index that causes it to block?

I think that investigating if the slowdown is worth the pain of the change. Anyway I think that laziness help since it defer some operations until they are needed. Note also that the indexing is in the user space memory, so it should be way faster than in normal databases.

I'll see if I can come up with a benchmark example; it seemed to be a deal breaker for me at the time, but I was doing everything with the default back-end (file-based). I will be working on switching to Cassandra very soon but am unsure of how this effects the stored index as yet.

agocorona · 2019-02-18T18:14:35Z

Good point. I was thinking that the user of the library, in this case, would explicitly be committing to being eventually consistent, rather than always consistent.

In synthesys I can make this optional, so in the setup, the programmer can specify the kind of consistency that he want. I have to look at the details first and think about it.

This made me wonder (and I took a quick look but am new to STM), what is the call in index that causes it to block?

No, i meant that if asynchronous indexing AND consistency is required then something like a re-implementation of STM over STM would be necessary. But never mind.

I'll see if I can come up with a benchmark example; it seemed to be a deal breaker for me at the time, but I was doing everything with the default back-end (file-based). I will be working on switching to Cassandra very soon but am unsure of how this effects the stored index as yet.

Cassandra would not suppose a change other than slower or faster indexing since it happens on memory

By the way, I'm working in a distributed database based on TCache.

bbarker · 2019-02-26T14:32:06Z

Sorry for the delay in looking into this. I have confirmed one thing though. TCache does generate index files, e.g. for my SxRecord data type, I have:

.tcachedata/index-SxRecord*                                                                                                        
.tcachedata/index-SxRecordCowMark  .tcachedata/index-SxRecordSxId  .tcachedata/index-SxRecordUTCTime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add asynchronous indexing? #10

Add asynchronous indexing? #10

bbarker commented Feb 15, 2019

agocorona commented Feb 17, 2019 •

edited

Loading

bbarker commented Feb 18, 2019

agocorona commented Feb 18, 2019 •

edited

Loading

bbarker commented Feb 26, 2019

Add asynchronous indexing? #10

Add asynchronous indexing? #10

Comments

bbarker commented Feb 15, 2019

agocorona commented Feb 17, 2019 • edited Loading

bbarker commented Feb 18, 2019

agocorona commented Feb 18, 2019 • edited Loading

bbarker commented Feb 26, 2019

agocorona commented Feb 17, 2019 •

edited

Loading

agocorona commented Feb 18, 2019 •

edited

Loading