Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniqueIndex query performance improvement #106

Open
uujava opened this issue Nov 18, 2016 · 11 comments
Open

UniqueIndex query performance improvement #106

uujava opened this issue Nov 18, 2016 · 11 comments

Comments

@uujava
Copy link
Contributor

uujava commented Nov 18, 2016

Opened for discussion.
Please see details in pull request #105

@npgall
Copy link
Owner

npgall commented Nov 18, 2016

Merged!

I like that pull request, nice work!

I made some minor adjustments:

  • You had a shortcut to delegate directly to the UniqueIndex if possible, on the first line of the CollectionQueryEngine.retrieve() method. However, this would bypass the code which orders results. As the UniqueIndex supports in() queries as well, it can sometimes return more than one object in the ResultSet, so I think we still need to order results. However the other place where you delegate to the UniqueIndex in getResultSetWithLowestRetrievalCost(), should cover that. The question is what the performance would be like. Can you re-test it?
  • I moved the code which performs instanceof checks, into the UniqueIndex.supportsQuery method, as this is the purpose of that method. I overrode the superclass implementation to ensure performance should be the same.
  • In the addAttributeIndex() method, I moved the new code for UniqueIndex code outside of an if() block, as there was an edge case where it would only work if the UniqueIndex was the first index added.

Can you re-test with the latest code?

@uujava
Copy link
Contributor Author

uujava commented Nov 19, 2016

It's now Ok, but not perfect :)

I've updated my tests a bit here in case you'll need it.

Let's consider following timings

Benchmark Average Units Description
HashTest.queryHash 0.530 us/op Top time I'd like to get
QueryTest.queryHash 1.141 us/op Before the fix
QueryTest.queryHash 0.837 us/op Compiled from current master. Current state
QueryTest.queryHash 0.679 us/op Change-1 (see below)
QueryTest.queryHash 0.540 us/op Change-2 (see below)
  • Change-1
    Add shortcut for Equals query and uniqueIndexes in CollectionQueryEngine#retrieve as below:

    if(query instanceof Equal) {
              Index<O> uniqueIndex = uniqueIndexes.get(((Equal)query).getAttribute());
              if (uniqueIndex != null && uniqueIndex.supportsQuery(query, queryOptions)) {
                  return uniqueIndex.retrieve(query, queryOptions);
              }
    }
    

    This saves 0.140 us/op or 19% saving. I guess it's safe, as it we do not need any queryOptions processing for the single Equals query.

  • Change-2
    Comment out two first lines in ConcurrentIndexedCollection#retrieve

    public ResultSet<O> retrieve(Query<O> query, QueryOptions queryOptions) {
        final QueryOptions queryOptionsOpened = queryOptions;
        // final QueryOptions queryOptionsOpened = openRequestScopeResourcesIfNecessary(queryOptions);
       // flagAsReadRequest(queryOptionsOpened)

It's definitely would not work now, but would save another 0.140 us/op and we will be pretty close to HashTest.queryHash timings.
Those two lines commented actually put two values to query options hash ad flags:

...
queryOptions.put(Persistence.class, persistence);
...
FlagsEnabled.forQueryOptions(queryOptions).add(PersistenceFlags.READ_REQUEST);

And this is rather expensive for this kind of queries. I guess It's unfixable in current version. I'd have in QueryOptions fields instead of storing in hash flags and options that are known at compile time. This would definitely faster. For now I'm just checking those option and flag already present in the query - it's a bit faster than putting them to map.
This will finally give 0.594 us/op that is only 11% slower than hash. I'd say it's perfect for now.
Will commit in a minute.

uujava added a commit to uujava/cqengine that referenced this issue Nov 19, 2016
@uujava
Copy link
Contributor Author

uujava commented Nov 19, 2016

Add pull request #107
Please review.

Thank you for your time.

@npgall
Copy link
Owner

npgall commented Nov 20, 2016

Thanks for the additional suggestions.

Re: Change 1 - this would need to be in the retrieveRecursive() method instead of the retrieve() method, in order for it to work with nested queries.

Re: Change 2 - I created an experimental branch to see if we could measure the performance difference by not allocating HashMaps in QueryOptions. It's a work in progress: https://github.com/npgall/cqengine/blob/optimize-query-options/code/src/main/java/com/googlecode/cqengine/query/option/QueryOptions.java

@uujava
Copy link
Contributor Author

uujava commented Nov 21, 2016

Re: Re: Change 1 - this would need to be in the retrieveRecursive() method instead of the retrieve() method, in order for it to work with nested queries

nested queries works well as it's properly handled in getResultSetWithLowestRetrievalCost() which is called from retrieveRecursive()
adding the check for Equals in retrieve() is a shortcut to bypass any other checks and return result set as fast as possible.

@uujava
Copy link
Contributor Author

uujava commented Nov 21, 2016

I've experimented further based on the idea from your optiize-query-options branch.
Below is a benchmark values for different options. Note, I'm experimenting on different hardware and values are incomparable to above. Also note that I've added another test that do not cache query options between executions.

Benchmark Caching nonCaching Units Description
HashTest.queryHash 0.207 0.207 us/op This is new top time to compare. Note it's 2.5 times faster than above
QueryTest.uniqueQuery 0.768 0.897 us/op Equals query test for cqengine 2.8.0
QueryTest.uniqueQuery 0.520 0.630 us/op Compiled from current master. Current state
QueryTest.uniqueQuery 0.422 0.562 us/op check for Equals in retrieve and check options before queryOptions.put
QueryTest.uniqueQuery 0.406 0.424 us/op Completely reworked QueryOption and FlagsEnabled to use properties instead of hashes. All changes in my branch HERE

As you could see, latest commits most win when not cache query option object.
I have not investigated why I still have 2 times difference compared to HashTest on a better hardware.

@npgall
Copy link
Owner

npgall commented Nov 22, 2016

We can't represent all of the flags as an enum actually, because we need to allow for extensibility. Several companies using CQEngine (mine included) have their own implementations of indexes and queries, and we need a way to allow index-specific flags to be set, where we don't know the keys in advance.

I have been reconsidering the approach I took in the branch actually. Mainly because of backward compatibility and the number of API changes involved.

I think another option may be to provide a QueryOptions.reset() method, to allow QueryOption objects to be recycled between requests instead. The reset() method would remove request-specific parameters, but would leave existing but empty HashMap and FlagsEnabled objects in place. This way we could avoid the need for new objects to be allocated each time.

I'll see if I can create a branch to test this.

@uujava
Copy link
Contributor Author

uujava commented Nov 23, 2016

I agree about compatibility, but I'm afraid it's hard to get top values with hash (with reset as well).
Note also, using hash, has additional GC pressure (with reset() as well) - each key, value pair wrapped with new HashEntry object. They do probably stay in young gen and be GCed relatively fast, but no gc is better than any gc.
Guess you'll get values close to line#3->nonCaching in the table above (corrected to your hardware) with reset().

@npgall
Copy link
Owner

npgall commented Nov 23, 2016

That's a good point about HashEntry/Node objects. I'll prepare a different branch so we can test it. It would be nice to find a way to avoid the HashEntry allocations.

EDIT: relevant: http://stackoverflow.com/questions/23828425/is-there-a-hashmap-implementation-in-java-that-produces-no-garbage

@npgall
Copy link
Owner

npgall commented Nov 29, 2016

Just a heads up that I've released CQEngine 2.9.1 which includes your first pull request and some of these performance improvements (many thanks!).
I'll keep this issue open until I/we've had time to look at the other optimizations in this issue as well.

@uujava
Copy link
Contributor Author

uujava commented Dec 1, 2016

Thank you on update and the time you spend on cqengine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants