groupBy followed by flatMap is starting to drop messages under load #1762

DareUrDream · 2019-06-24T15:12:52Z

Re-post of question in stackoveflow -- https://stackoverflow.com/questions/56726323/reactor-groupby-is-starting-to-drop-messages

Below is my data pipe definition. But this is start to fall out under load when am creating say hundred/thousands group per minute(Group stays in memory from 15 min to 8 hours).
Anyways to solve this ..

I did refer #1544, #726, #596, #931 but my question is even if I include Integer.MAX_VALUE as prefetch for groupBy and say a substantially large number or say INTEGER.MAX_VALUE for maxConcurreny on the flatMap, would it not run out of those numbers in future and start dropping message. (Pardon my ignorance on Reactor, I am a newbie trying to learn while working)

ds.getPublisher()
        .onBackpressureBuffer(15000)
        .onBackpressureDrop(record -> LOGGER.error("Backpressure applied-1. Dropped records: {}", record))
        .groupBy(record -> record.getGroupId())
        .flatMap(group -> group
                .timeout(Duration.ofSeconds(60))
                .bufferUntil(record -> isGroupComplete(record))
                .bufferTimeout(100, Duration.ofSeconds(5))
                .map(listOflistOfRecords -> listOflistOfRecords.stream().flatMap(List::stream).collect(Collectors.toList()))
                .onErrorContinue((th, records) -> {
                    LOGGER.error("Timedout records: {}", records);
                                            // TAKE ACTION ON THE RECORDS
                }))
        .filter(records -> {
            return (publishController.shouldIPublish()) ? true 
                    : records.get(0).getCreatedTimestamp() <= (publishController.stopRequestTimestamp() - 5);
        })
        .doOnDiscard(List.class, records -> {
            if(! records.isEmpty()) {
                LOGGER.error("Discarded: {}", records);
                discardedRecords.put(records, new Object());
            } else {
                LOGGER.error("Empty record received. This should never happen.");
            }
        })
        .map(record -> Collections.unmodifiableList(Enricher.enrich(record)))
        .map(dbRecords -> RecordTransformer.transform(dbRecords))
        .retryBackoff(MAX_RETRY, Duration.ofSeconds(FIRST_BACKOFF_IN_SECONDS), Duration.ofSeconds(MAX_BACKOFF_IN_SECONDS))
        .publishOn(Schedulers.single());

The text was updated successfully, but these errors were encountered:

bsideup · 2019-06-24T15:20:55Z

Hi @DareUrDream,

A few things here:

onBackpressureDrop after onBackpressureBuffer - you already applied backpressure strategy, why twice?
Anything interesting in the logs?
Why do you publish on a single scheduler at the end?

bsideup · 2019-06-24T15:22:19Z

even if I include Integer.MAX_VALUE as prefetch for groupBy and say a substantially large number or say INTEGER.MAX_VALUE for maxConcurreny on the flatMap, would it not run out of those numbers in future and start dropping message

If you use Integer.MAX_VALUE as value, it will create an unbounded queue and queue as long as you have memory in your app

DareUrDream · 2019-06-24T15:34:10Z

@bsideup Let say I have twice the memory needed to hold Integer.MAX_VALUE records, then what would happen when that much is in memory and say another 100K records come in..

DareUrDream · 2019-06-24T15:38:45Z

@bsideup Reponse to the observations below..

onBackpressureDrop after onBackpressureBuffer - you already applied backpressure strategy, why twice?

--> This was the test that helped me figure out that groupBy was becoming a bottleNeck for me.. It is not there in my production code.

Anything interesting in the logs?

--> Just a help :-)

Why do you publish on a single scheduler at the end?

--> This is interesting... I have a situation where I have only 2 vCPU's available, so am trying to consume on single scheduler, and my consumer is pretty fast

DareUrDream · 2019-06-25T10:30:48Z

@bsideup Let say I have twice the memory needed to hold Integer.MAX_VALUE records, then what would happen when that much is in memory and say another 100K records come in..

This is a very important question for us as the application would keep running for days and eventually in few months it will stop working if the groupBy start dropping messages... Do we have an alternative ??

Kindrat · 2019-10-24T12:30:14Z

@smaldini
There is a default queue size reused from prefetch arg (Queues.SMALL_BUFFER_SIZE == 256). So it's not possible to create more than 256 groups for default groupBy Flux factory method.

Can we have ability to create unbounded group supplier without setting prefetch to Integer.MAX_VALUE?

Kindrat · 2019-10-24T12:32:50Z

@smaldini
Also in Queues

public static <T> Supplier<Queue<T>> unbounded(int linkSize) {
		if (linkSize == XS_BUFFER_SIZE) {
			return XS_UNBOUNDED;
		}
		else if (linkSize == Integer.MAX_VALUE || linkSize == SMALL_BUFFER_SIZE) {
			return unbounded();
		}
		return  () -> new SpscLinkedArrayQueue<>(linkSize);
	}

Integer.MAX_VALUE still will drop SMALL_BUFFER_SIZE queue supplier...

bsideup · 2019-10-24T12:43:19Z

@Kindrat

Without a mechanism that will "cleanup" groups over time, creating Integer.MAX_VALUE groups will lead to a memory leak and may cause OOMs

Kindrat · 2019-10-25T21:31:40Z

@bsideup
Yeah, I'm cleaning groups manually with external signal from another publisher and takeUntilOther. Don't think there could be any default generic mechanism for unbounded dynamic groups.

I guess, it would be helpful to mention in flatMap doc that maxConcurrency could be a problem for continuous groups.

swimmesberger · 2020-01-30T09:36:17Z

In our case we do not have unlimited groups, we have e.g. a group count of something in between of 10 - 10_000 this will be fixed for the lifetime of the application (so no memory leak here). But we can't use groupBy (took me hours to figured that out...) because of that issue here.

simonbasle · 2020-10-28T11:13:00Z

closing in favor of #2352, which seem to occur with low cardinality (when immediately cancelling groups)

DareUrDream changed the title ~~groupBy is starting to drop messages under load~~ groupBy followed by flatMap is starting to drop messages under load Jun 24, 2019

smaldini added the status/need-investigation This needs more in-depth investigation label Jun 24, 2019

simonbasle closed this as completed Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupBy followed by flatMap is starting to drop messages under load #1762

groupBy followed by flatMap is starting to drop messages under load #1762

DareUrDream commented Jun 24, 2019

bsideup commented Jun 24, 2019

bsideup commented Jun 24, 2019

DareUrDream commented Jun 24, 2019 •

edited

Loading

DareUrDream commented Jun 24, 2019 •

edited

Loading

DareUrDream commented Jun 25, 2019

Kindrat commented Oct 24, 2019

Kindrat commented Oct 24, 2019

bsideup commented Oct 24, 2019

Kindrat commented Oct 25, 2019

swimmesberger commented Jan 30, 2020

simonbasle commented Oct 28, 2020

groupBy followed by flatMap is starting to drop messages under load #1762

groupBy followed by flatMap is starting to drop messages under load #1762

Comments

DareUrDream commented Jun 24, 2019

bsideup commented Jun 24, 2019

bsideup commented Jun 24, 2019

DareUrDream commented Jun 24, 2019 • edited Loading

DareUrDream commented Jun 24, 2019 • edited Loading

DareUrDream commented Jun 25, 2019

Kindrat commented Oct 24, 2019

Kindrat commented Oct 24, 2019

bsideup commented Oct 24, 2019

Kindrat commented Oct 25, 2019

swimmesberger commented Jan 30, 2020

simonbasle commented Oct 28, 2020

DareUrDream commented Jun 24, 2019 •

edited

Loading

DareUrDream commented Jun 24, 2019 •

edited

Loading