Certain queries will cause a massive memory leak when running Loki in monolith mode (v3) #13277

ASatanicPickle · 2024-06-20T17:15:45Z

Describe the bug

A dramatic memory leak occurs with a specific kind of query when running the all-in-one (-target=all) loki, v3 is affected. 2.x line is fine. We had pods spike up to 100GB in a matter of minutes.

A query that causes the leak to happen:

{stream="stdout",name="loki-canary"} |= "p" | json |= "p"

These queries work fine,

{stream="stdout",name="loki-canary"} |= "p" | json
{stream="stdout",name="loki-canary"} |  json  |= "p"

Seems to be specific to having filters before and after the parser expr. (Note, I didn't test that much around this though)

I did some digging and I think the issue is here:

loki/pkg/logql/syntax/ast.go

Line 144 in f897758

filters = append(filters, f)

pkg/logql/syntax/ast.go in the reorderStages() function, specifically with the combineFilters() function.

The function seems to have a side effect where it is changing the original request's AST and with multiple queries all running in parallel, the AST gets real big. There's a heap dump below.

I was able to get it working by changing this line:

for _, s := range m { switch f := s.(type) { case *LineFilterExpr: filters = append(filters, f)

to

for _, s := range m { switch f := s.(type) { case *LineFilterExpr: filters = append(filters, MustClone(f))

Clone the filter exprs so the combineFilters() won't change the original request's AST.

To Reproduce

Steps to reproduce the behavior:

Run the loki with -target=all
Load it with some data ( I used the canary program)
Make sure some chunks get flushed to disk, the bug will not occur if there are no chunks flushed to the store.
Run the query, {stream="stdout",name="loki-canary"} |= "p" | json |= "p"

Expected behavior

Should work normally.

Environment:

Infrastructure: repro'd on my laptop and in a k8s environment

The loki config that I was using locally to repro:

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
    chunk_encoding: snappy
    max_chunk_age: 5m
    chunk_idle_period: 1m

common:
  instance_addr: localhost
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

Heap dump while the leak was occurring:

Thanks,
Mark.

The text was updated successfully, but these errors were encountered:

sondrelg · 2024-07-26T08:57:25Z

We're seeing something similar. Our primary loki-backend pod is continously oomkilled

ASatanicPickle changed the title ~~Certain queries will cause a memory leak when running Loki in monolith mode~~ Certain queries will cause a memory leak when running Loki in monolith mode (v3) Jun 20, 2024

ASatanicPickle changed the title ~~Certain queries will cause a memory leak when running Loki in monolith mode (v3)~~ Certain queries will cause a massive memory leak when running Loki in monolith mode (v3) Jun 24, 2024

JStickler added type/bug Somehing is not working as expected needs triage Issue requires investigation labels Jun 24, 2024

ASatanicPickle mentioned this issue Jul 9, 2024

bug: OOM when filtering lines after pattern #13427

Open

yincongcyincong mentioned this issue Jul 12, 2024

fix: ast left cycular reference result in oom #13501

Merged

MasslessParticle closed this as completed in #13501 Aug 1, 2024

JStickler mentioned this issue Oct 30, 2024

Query performance drops as one loki backend spins out of control #14312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Certain queries will cause a massive memory leak when running Loki in monolith mode (v3) #13277

Certain queries will cause a massive memory leak when running Loki in monolith mode (v3) #13277

ASatanicPickle commented Jun 20, 2024 •

edited

Loading

sondrelg commented Jul 26, 2024

Certain queries will cause a massive memory leak when running Loki in monolith mode (v3) #13277

Certain queries will cause a massive memory leak when running Loki in monolith mode (v3) #13277

Comments

ASatanicPickle commented Jun 20, 2024 • edited Loading

sondrelg commented Jul 26, 2024

ASatanicPickle commented Jun 20, 2024 •

edited

Loading