Add filequeue functionality #1601

mattdurham · 2024-09-03T16:12:23Z

FileQueue offers the lowest level functionality, it writes a block of data and meta data to a file. Then pushes the data out. This also serves as the introduction to the actor framework, primarily via DoWork function. This allows us to avoid mutexes and have a well defined workflow that will be reused.

internal/component/prometheus/remote/queue/filequeue/filequeue.go

thampiotr

First round! It's looking good, happy it's comparably lean 👍

internal/component/prometheus/remote/queue/filequeue/filequeue.go

Co-authored-by: Piotr <[email protected]>

internal/component/prometheus/remote/queue/filequeue/filequeue.go

thampiotr · 2024-09-05T15:41:56Z

internal/component/prometheus/remote/queue/filequeue/filequeue.go

+	if err != nil {
+		return "", err
+	}
+	err = q.writeFile(name, rBuf)


One file = one piece of data implies that we probably want to have a good level of batching for this queue and probably it's worth calling out that it's going to work better with larger data.

In general yes, in the case of a single node exporter self scraping its kilobytes. The default is it writes 10,000 signals in the full PR with a flush timer to ensure its timely. When testing internally the files after snappy compressed were megabytes.

Yeah, that likely will work if you write entire scrapes to the queue and I'm fine with that... if you started to write single samples then it's gonna be a lot of overhead. That's all fine as long as we make it somewhat clear that the queue is optimised only for some use patterns.

BTW, I'd like to see some metrics on how the queue performs, but I'm happy for this to be in future PRs.

There are some e2e benchmarks in the full pr. The file queue is really cheap though.

thampiotr · 2024-09-05T15:42:59Z

internal/component/prometheus/remote/queue/filequeue/filequeue_bench_test.go

+	"github.com/vladopajic/go-actor/actor"
+)
+
+func BenchmarkFileQueue(t *testing.B) {


Nit: Can you add as a comment the current results of the benchmark for some future reference? I think it can be useful in certain cases.

Going to drop this benchmark, its not super useful. There are better end to end benchmarks that track it better.

thampiotr · 2024-09-05T15:46:22Z

internal/component/prometheus/remote/queue/filequeue/filequeue_bench_test.go

+		require.NoError(t, err)
+		q.Start()
+		defer q.Stop()
+		err = q.Store(context.Background(), nil, []byte("test"))


I'd imagine we want to create one q and store something t.N times.

thampiotr · 2024-09-05T15:46:54Z

internal/component/prometheus/remote/queue/filequeue/filequeue_bench_test.go

+		require.Len(t, meta, 0)
+
+		// Ensure nothing new comes through.
+		timer := time.NewTicker(100 * time.Millisecond)


Doesn't this make the benchmark always last 100ms+?

thampiotr · 2024-09-05T15:47:33Z

internal/component/prometheus/remote/queue/filequeue/filequeue_bench_test.go

+		require.NoError(t, err)
+		q.Start()
+		defer q.Stop()
+		err = q.Store(context.Background(), nil, []byte("test"))


I wonder how it works for different message lengths too.

thampiotr · 2024-09-05T15:49:55Z

internal/component/prometheus/remote/queue/filequeue/filequeue_test.go

+	require.NoError(t, err)
+
+	// Send is async so may need to wait a bit for it happen.
+	require.Eventually(t, func() bool {


Could this be flaky? when Eventually doesn't catch the 1.committed existing, cause it appears and disappears too fast?

Would be good to run this test with -count 100 or sth to validate it's not flaky if you didn't already :)

I just realised that we won't delete the file until we call Get on the data handle... not sure if I like this side-effect, let's discuss in another thread.

thampiotr · 2024-09-05T15:55:51Z

internal/component/prometheus/remote/queue/filequeue/filequeue.go

+			Get: func() (map[string]string, []byte, error) {
+				return get(q.logger, name)
+			},


So a call to get will delete the file, and thus types.DataHandle.Get() can only be called once because it has a side-effect (unexpected for a method called Get, btw)

I think that's an important behaviour that needs to be documented. But also not sure if we want this... can you share some context why we do a lazy file reading via DataHandle and delete it only after it's read?

What happens if Get is never called for some reason? Seems like we'd leak a file and it would be picked up again on next run?

The assumption (that I should document) is that Get is called when the caller is ready to process the file, to limit the amount of lost data and limit the amount of memory in use. The out queue (shown in upcoming PR) has a capacity of one, so if Get is never called then data was never sent. This allows us to err on the side of dropping data versus sending duplicates.

I think we should make it super clear in the name that you can retrieve data once only and there's a side-effect of it being deleted forever. Maybe Pop as it makes sense in the context of the queue.

This allows us to err on the side of dropping data versus sending duplicates.

Wouldn't we prefer to send duplicates? Or is the retry / backoff handled further down, in next PRs?

I'm also not clear on the implications of Get never being called for some reason (e.g. error). Seems like a file won't be deleted, but we will continue sending DataHandles for the subsequent elements in the queue. So over time we can build up a bunch of unprocessed files left behind? At startup we would try to re-send them and likely fail due to too-old-timestamp. Maybe we should just make sure that Get is always called, even when error happens - it's not the cleanest way, but at least allows us to keep doing this lazy read from disk pattern. I'm cool if we decide that's what we want to do, but let's leave some comments behind ;)

Retry/Backoff is handled in the network loop. In regards to get not being called the actual implementation of the callee processes them one at a time. If Get is not called then it should be retried on restart which would requeue it. In the main PR the lifecycles of the FileQueue and the Endpoint (callee in practice), are tied together at the component level. From the filequeue perspective the endpoint processes the file sequentially, deserializing and then adding to the network buffer, and stops pulling files when the network buffer is full.

Pop is more accurate to whats going on so will change that and add a comment.

Co-authored-by: Piotr <[email protected]>

mattdurham added 2 commits September 3, 2024 12:11

Checkin for file queue

dc374a4

add comment

76d0787

mattdurham commented Sep 3, 2024

View reviewed changes

internal/component/prometheus/remote/queue/filequeue/filequeue.go Outdated Show resolved Hide resolved

mattdurham commented Sep 3, 2024

View reviewed changes

internal/component/prometheus/remote/queue/filequeue/filequeue.go Show resolved Hide resolved

mattdurham marked this pull request as ready for review September 3, 2024 18:32

mattdurham changed the title ~~WIP: Add filequeue section~~ Add filequeue functionality Sep 3, 2024

mattdurham requested review from ptodev, thampiotr and wildum September 3, 2024 18:34

thampiotr reviewed Sep 5, 2024

View reviewed changes

mattdurham and others added 8 commits September 5, 2024 07:50

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

79f9c70

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

dc62e17

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

f2a5a9e

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

6ed9db1

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

ac2f19b

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

c01cd07

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

6ec7a37

Co-authored-by: Piotr <[email protected]>

naming and error handling feedback from PR

6cc3f4a

mattdurham requested a review from thampiotr September 5, 2024 14:42

thampiotr reviewed Sep 5, 2024

View reviewed changes

mattdurham and others added 6 commits September 5, 2024 13:30

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

a4f64c1

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

5c2d3a3

Co-authored-by: Piotr <[email protected]>

Update internal/component/prometheus/remote/queue/filequeue/filequeue.go

aee7ea1

Co-authored-by: Piotr <[email protected]>

drop benchmark

94bf1b6

Merge branch 'file_queue' of github.com:grafana/alloy into file_queue

df3e725

rename get to pop

7d84259

thampiotr approved these changes Sep 6, 2024

View reviewed changes

mattdurham merged commit 4670f64 into dev.new-wal Sep 6, 2024
14 of 15 checks passed

mattdurham deleted the file_queue branch September 6, 2024 15:04

github-actions bot added the frozen-due-to-age label Oct 7, 2024

github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filequeue functionality #1601

Add filequeue functionality #1601

mattdurham commented Sep 3, 2024 •

edited

Loading

thampiotr left a comment

thampiotr Sep 5, 2024

mattdurham Sep 5, 2024

thampiotr Sep 6, 2024

mattdurham Sep 6, 2024

thampiotr Sep 5, 2024

mattdurham Sep 6, 2024

thampiotr Sep 5, 2024

thampiotr Sep 5, 2024

thampiotr Sep 5, 2024

thampiotr Sep 5, 2024

thampiotr Sep 5, 2024

thampiotr Sep 5, 2024

mattdurham Sep 5, 2024

thampiotr Sep 6, 2024

mattdurham Sep 6, 2024

mattdurham Sep 6, 2024

Add filequeue functionality #1601

Add filequeue functionality #1601

Conversation

mattdurham commented Sep 3, 2024 • edited Loading

thampiotr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdurham commented Sep 3, 2024 •

edited

Loading