UniqueIndex query performance improvement #106

uujava · 2016-11-18T21:05:39Z

Opened for discussion.
Please see details in pull request #105

npgall · 2016-11-18T21:51:11Z

Merged!

I like that pull request, nice work!

I made some minor adjustments:

You had a shortcut to delegate directly to the UniqueIndex if possible, on the first line of the CollectionQueryEngine.retrieve() method. However, this would bypass the code which orders results. As the UniqueIndex supports in() queries as well, it can sometimes return more than one object in the ResultSet, so I think we still need to order results. However the other place where you delegate to the UniqueIndex in getResultSetWithLowestRetrievalCost(), should cover that. The question is what the performance would be like. Can you re-test it?
I moved the code which performs instanceof checks, into the UniqueIndex.supportsQuery method, as this is the purpose of that method. I overrode the superclass implementation to ensure performance should be the same.
In the addAttributeIndex() method, I moved the new code for UniqueIndex code outside of an if() block, as there was an edge case where it would only work if the UniqueIndex was the first index added.

Can you re-test with the latest code?

uujava · 2016-11-19T19:54:51Z

It's now Ok, but not perfect :)

I've updated my tests a bit here in case you'll need it.

Let's consider following timings

Benchmark	Average	Units	Description
HashTest.queryHash	0.530	us/op	Top time I'd like to get
QueryTest.queryHash	1.141	us/op	Before the fix
QueryTest.queryHash	0.837	us/op	Compiled from current master. Current state
QueryTest.queryHash	0.679	us/op	Change-1 (see below)
QueryTest.queryHash	0.540	us/op	Change-2 (see below)

Change-1
Add shortcut for Equals query and uniqueIndexes in CollectionQueryEngine#retrieve as below:

if(query instanceof Equal) {
          Index<O> uniqueIndex = uniqueIndexes.get(((Equal)query).getAttribute());
          if (uniqueIndex != null && uniqueIndex.supportsQuery(query, queryOptions)) {
              return uniqueIndex.retrieve(query, queryOptions);
          }
}

This saves 0.140 us/op or 19% saving. I guess it's safe, as it we do not need any queryOptions processing for the single Equals query.

Change-2
Comment out two first lines in ConcurrentIndexedCollection#retrieve

    public ResultSet<O> retrieve(Query<O> query, QueryOptions queryOptions) {
        final QueryOptions queryOptionsOpened = queryOptions;
        // final QueryOptions queryOptionsOpened = openRequestScopeResourcesIfNecessary(queryOptions);
       // flagAsReadRequest(queryOptionsOpened)

It's definitely would not work now, but would save another 0.140 us/op and we will be pretty close to HashTest.queryHash timings.
Those two lines commented actually put two values to query options hash ad flags:

...
queryOptions.put(Persistence.class, persistence);
...
FlagsEnabled.forQueryOptions(queryOptions).add(PersistenceFlags.READ_REQUEST);

And this is rather expensive for this kind of queries. I guess It's unfixable in current version. I'd have in QueryOptions fields instead of storing in hash flags and options that are known at compile time. This would definitely faster. For now I'm just checking those option and flag already present in the query - it's a bit faster than putting them to map.
This will finally give 0.594 us/op that is only 11% slower than hash. I'd say it's perfect for now.
Will commit in a minute.

uujava · 2016-11-19T20:26:14Z

Add pull request #107
Please review.

Thank you for your time.

npgall · 2016-11-20T03:31:16Z

Thanks for the additional suggestions.

Re: Change 1 - this would need to be in the retrieveRecursive() method instead of the retrieve() method, in order for it to work with nested queries.

Re: Change 2 - I created an experimental branch to see if we could measure the performance difference by not allocating HashMaps in QueryOptions. It's a work in progress: https://github.com/npgall/cqengine/blob/optimize-query-options/code/src/main/java/com/googlecode/cqengine/query/option/QueryOptions.java

uujava · 2016-11-21T13:51:35Z

Re: Re: Change 1 - this would need to be in the retrieveRecursive() method instead of the retrieve() method, in order for it to work with nested queries

nested queries works well as it's properly handled in getResultSetWithLowestRetrievalCost() which is called from retrieveRecursive()
adding the check for Equals in retrieve() is a shortcut to bypass any other checks and return result set as fast as possible.

uujava · 2016-11-21T14:06:59Z

I've experimented further based on the idea from your optiize-query-options branch.
Below is a benchmark values for different options. Note, I'm experimenting on different hardware and values are incomparable to above. Also note that I've added another test that do not cache query options between executions.

Benchmark	Caching	nonCaching	Units	Description
HashTest.queryHash	0.207	0.207	us/op	This is new top time to compare. Note it's 2.5 times faster than above
QueryTest.uniqueQuery	0.768	0.897	us/op	Equals query test for cqengine 2.8.0
QueryTest.uniqueQuery	0.520	0.630	us/op	Compiled from current master. Current state
QueryTest.uniqueQuery	0.422	0.562	us/op	check for Equals in retrieve and check options before queryOptions.put
QueryTest.uniqueQuery	0.406	0.424	us/op	Completely reworked QueryOption and FlagsEnabled to use properties instead of hashes. All changes in my branch HERE

As you could see, latest commits most win when not cache query option object.
I have not investigated why I still have 2 times difference compared to HashTest on a better hardware.

npgall · 2016-11-22T21:23:08Z

We can't represent all of the flags as an enum actually, because we need to allow for extensibility. Several companies using CQEngine (mine included) have their own implementations of indexes and queries, and we need a way to allow index-specific flags to be set, where we don't know the keys in advance.

I have been reconsidering the approach I took in the branch actually. Mainly because of backward compatibility and the number of API changes involved.

I think another option may be to provide a QueryOptions.reset() method, to allow QueryOption objects to be recycled between requests instead. The reset() method would remove request-specific parameters, but would leave existing but empty HashMap and FlagsEnabled objects in place. This way we could avoid the need for new objects to be allocated each time.

I'll see if I can create a branch to test this.

uujava · 2016-11-23T12:16:50Z

I agree about compatibility, but I'm afraid it's hard to get top values with hash (with reset as well).
Note also, using hash, has additional GC pressure (with reset() as well) - each key, value pair wrapped with new HashEntry object. They do probably stay in young gen and be GCed relatively fast, but no gc is better than any gc.
Guess you'll get values close to line#3->nonCaching in the table above (corrected to your hardware) with reset().

npgall · 2016-11-23T17:43:49Z

That's a good point about HashEntry/Node objects. I'll prepare a different branch so we can test it. It would be nice to find a way to avoid the HashEntry allocations.

EDIT: relevant: http://stackoverflow.com/questions/23828425/is-there-a-hashmap-implementation-in-java-that-produces-no-garbage

npgall · 2016-11-29T00:25:46Z

Just a heads up that I've released CQEngine 2.9.1 which includes your first pull request and some of these performance improvements (many thanks!).
I'll keep this issue open until I/we've had time to look at the other optimizations in this issue as well.

uujava · 2016-12-01T07:35:43Z

Thank you on update and the time you spend on cqengine.

uujava mentioned this issue Nov 18, 2016

ExistsIn query performance #102

Closed

uujava added a commit to uujava/cqengine that referenced this issue Nov 19, 2016

improvement for npgall#106 (comment)

4131729

npgall added the enhancement label Jan 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UniqueIndex query performance improvement #106

UniqueIndex query performance improvement #106

uujava commented Nov 18, 2016

npgall commented Nov 18, 2016

uujava commented Nov 19, 2016 •

edited

Loading

uujava commented Nov 19, 2016

npgall commented Nov 20, 2016

uujava commented Nov 21, 2016

uujava commented Nov 21, 2016 •

edited

Loading

npgall commented Nov 22, 2016

uujava commented Nov 23, 2016 •

edited

Loading

npgall commented Nov 23, 2016 •

edited

Loading

npgall commented Nov 29, 2016 •

edited

Loading

uujava commented Dec 1, 2016

UniqueIndex query performance improvement #106

UniqueIndex query performance improvement #106

Comments

uujava commented Nov 18, 2016

npgall commented Nov 18, 2016

uujava commented Nov 19, 2016 • edited Loading

uujava commented Nov 19, 2016

npgall commented Nov 20, 2016

uujava commented Nov 21, 2016

uujava commented Nov 21, 2016 • edited Loading

npgall commented Nov 22, 2016

uujava commented Nov 23, 2016 • edited Loading

npgall commented Nov 23, 2016 • edited Loading

npgall commented Nov 29, 2016 • edited Loading

uujava commented Dec 1, 2016

uujava commented Nov 19, 2016 •

edited

Loading

uujava commented Nov 21, 2016 •

edited

Loading

uujava commented Nov 23, 2016 •

edited

Loading

npgall commented Nov 23, 2016 •

edited

Loading

npgall commented Nov 29, 2016 •

edited

Loading