Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoCollection#countDocuments Performance in Driver 4.x [DATAMONGO-2669] #3522

Closed
spring-projects-issues opened this issue Dec 9, 2020 · 3 comments
Assignees
Labels
in: core Issues in core support type: regression A regression from a previous release

Comments

@spring-projects-issues
Copy link

spring-projects-issues commented Dec 9, 2020

Richard Kwasnicki opened DATAMONGO-2669 and commented

When using SimpleMongoRepository.count(), MongoDB is now performing a collscan instead of using estimated count by metadata. I had a short discussion with a MongoDB developer at https://developer.mongodb.com/community/forums/t/upgrading-java-mongodriver-3-x-to-4-x-leads-to-lock-with-aggregate-countstrategy/12754

He advised to never use a non-empty query on exact count and better go with MongoCollection#estimatedDocumentCount instead of  MongoCollection#countDocuments

For us it took minutes to perform a count query on a quite large collection on startup


Affects: 3.1.1 (2020.0.1)

@spring-projects-issues spring-projects-issues added status: waiting-for-triage An issue we've not yet triaged in: core Issues in core support labels Dec 30, 2020
@choda
Copy link

choda commented Jan 12, 2021

A similar issue occurs with MongoTemplate::count(Query query, ...). Under the hood, it uses an aggregation similar to this:

db.items.aggregate([
{ $match: {<query>} },
{ $group: { _id: 1, n: { $sum: 1 } } }
])

And when the query is empty, ie when counting all tasks, a full collection scan is performed, which for large collections is painfully slow... and can really impair pagination in a Spring Data Rest project.

@mp911de mp911de added the for: team-attention An issue we need to discuss as a team to make progress label Jan 13, 2021
@christophstrobl christophstrobl self-assigned this Jan 15, 2021
@christophstrobl christophstrobl added type: regression A regression from a previous release and removed for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage An issue we've not yet triaged labels Jan 15, 2021
@christophstrobl
Copy link
Member

MongoDB db.collection.count() operates upon collection statistics that may be inaccurate.
Currently there is no support for sessions nor transactions which means that changes like inserts & removals during an active session/transaction are visible outside the session and counted.
Therefore, we'll not change the current behavior right now, but update the documentation (#3541).

Nevertheless, we're aware of the impact this has and already reached to our colleagues at MongoDB to find a solution (maybe based on $collStats: { count: { } }) for this in the future.

One workaround could be a custom MongoTemplate implementation overriding the doCount(String, Document, CountOptions) method.

@Override
protected long doCount(String collectionName, Document filter, CountOptions options) {
	
	if(!countCanBeEstimated(filter, options)) {
		return super.doCount(collectionName, filter, options);
	}

	return execute(collectionName, collection -> collection.estimatedDocumentCount());
}

private boolean countCanBeEstimated(Document filter, CountOptions options) {

	return 
		// only empty filter for estimatedCount
		filter.isEmpty() && 
		// no skip, no limit,... 
		isEmptyOptions(options) && 
		// transaction active?
		!MongoDatabaseUtils.isTransactionActive(getMongoDatabaseFactory()); 
}

The reactive variant looks a bit different due to the deferred transaction evaluation.

@Override
protected Mono<Long> doCount(String collectionName, Document filter, CountOptions options) {

	if(!countCanBeEstimated(filter, options)) {
		return super.doCount(collectionName, filter, options);
	}
	
	return ReactiveMongoDatabaseUtils.isTransactionActive(getMongoDatabaseFactory())
		.flatMap(txActive -> {

			if(txActive) {
				return <subclass-name>.super.doCount(collectionName, filter, options);
			}

			return createMono(collectionName,
				collection -> collection.estimatedDocumentCount());
	});
}

@christophstrobl christophstrobl added the status: on-hold We cannot start working on this issue yet label Jan 21, 2021
@JoergHeinicke5005
Copy link

We hit this issue before with an explicit call to count(..) with an empty query on a large collection - and could therefore quite quickly figure out the more subtle or slightly obfuscated issue when doing the following relying on Spring Data's query method derivation:

public interface EntityRepository extends MongoRepository<Entity, String> {
    Page<Entity> findBy(Pageable pageable);
}

Here the count and thereby the collection scan is triggered in PagedExecution (nested class of MongoQueryExecution), line 140:

long count = operation.matching(Query.of(query).skip(-1).limit(-1)).count();

I don't think what we are doing is so unusual but I couldn't find anybody discussing this aspect.
I would be interested if there is a similar easy workaround like above even though this is much deeper inside Spring Data MongoDB's internals.

@christophstrobl christophstrobl linked a pull request Jan 28, 2022 that will close this issue
@christophstrobl christophstrobl removed the status: on-hold We cannot start working on this issue yet label Jan 28, 2022
mp911de added a commit that referenced this issue Mar 11, 2022
Reorder methods. Add links to Javadoc. Tweak wording.

See: #3522
Original pull request: #3951.
mp911de pushed a commit that referenced this issue Mar 11, 2022
This commit introduce an option that allows users to opt in on using estimatedDocumentCount instead of countDocuments in case the used filter query is empty.
To still be able to retrieve the exact number of matching documents we also introduced MongoTemplate#exactCount.

Closes: #3522
Original pull request: #3951.
mp911de added a commit that referenced this issue Mar 11, 2022
Reorder methods. Add links to Javadoc. Tweak wording.

See: #3522
Original pull request: #3951.
mp911de added a commit that referenced this issue Mar 11, 2022
Remove duplicate dependency declaration.

See: #3522
@mp911de mp911de added this to the 3.4 M4 (2021.2.0) milestone Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core support type: regression A regression from a previous release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants