-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoCollection#countDocuments Performance in Driver 4.x [DATAMONGO-2669] #3522
Comments
A similar issue occurs with MongoTemplate::count(Query query, ...). Under the hood, it uses an aggregation similar to this:
And when the query is empty, ie when counting all tasks, a full collection scan is performed, which for large collections is painfully slow... and can really impair pagination in a Spring Data Rest project. |
MongoDB db.collection.count() operates upon collection statistics that may be inaccurate. Nevertheless, we're aware of the impact this has and already reached to our colleagues at MongoDB to find a solution (maybe based on $collStats: { count: { } }) for this in the future. One workaround could be a custom @Override
protected long doCount(String collectionName, Document filter, CountOptions options) {
if(!countCanBeEstimated(filter, options)) {
return super.doCount(collectionName, filter, options);
}
return execute(collectionName, collection -> collection.estimatedDocumentCount());
}
private boolean countCanBeEstimated(Document filter, CountOptions options) {
return
// only empty filter for estimatedCount
filter.isEmpty() &&
// no skip, no limit,...
isEmptyOptions(options) &&
// transaction active?
!MongoDatabaseUtils.isTransactionActive(getMongoDatabaseFactory());
} The reactive variant looks a bit different due to the deferred transaction evaluation. @Override
protected Mono<Long> doCount(String collectionName, Document filter, CountOptions options) {
if(!countCanBeEstimated(filter, options)) {
return super.doCount(collectionName, filter, options);
}
return ReactiveMongoDatabaseUtils.isTransactionActive(getMongoDatabaseFactory())
.flatMap(txActive -> {
if(txActive) {
return <subclass-name>.super.doCount(collectionName, filter, options);
}
return createMono(collectionName,
collection -> collection.estimatedDocumentCount());
});
} |
We hit this issue before with an explicit call to count(..) with an empty query on a large collection - and could therefore quite quickly figure out the more subtle or slightly obfuscated issue when doing the following relying on Spring Data's query method derivation:
Here the count and thereby the collection scan is triggered in
I don't think what we are doing is so unusual but I couldn't find anybody discussing this aspect. |
This commit introduce an option that allows users to opt in on using estimatedDocumentCount instead of countDocuments in case the used filter query is empty. To still be able to retrieve the exact number of matching documents we also introduced MongoTemplate#exactCount. Closes: #3522 Original pull request: #3951.
Remove duplicate dependency declaration. See: #3522
Richard Kwasnicki opened DATAMONGO-2669 and commented
When using SimpleMongoRepository.count(), MongoDB is now performing a collscan instead of using estimated count by metadata. I had a short discussion with a MongoDB developer at https://developer.mongodb.com/community/forums/t/upgrading-java-mongodriver-3-x-to-4-x-leads-to-lock-with-aggregate-countstrategy/12754
He advised to never use a non-empty query on exact count and better go with
MongoCollection#estimatedDocumentCount
instead ofMongoCollection#countDocuments
.For us it took minutes to perform a count query on a quite large collection on startup
Affects: 3.1.1 (2020.0.1)
The text was updated successfully, but these errors were encountered: