Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Promtail] Support reading from compressed log files #5956

Closed
frittentheke opened this issue Apr 19, 2022 · 9 comments · Fixed by #6708
Closed

[Promtail] Support reading from compressed log files #5956

frittentheke opened this issue Apr 19, 2022 · 9 comments · Fixed by #6708
Labels
component/promtail keepalive An issue or PR that will be kept alive and never marked as stale.

Comments

@frittentheke
Copy link
Contributor

frittentheke commented Apr 19, 2022

Is your feature request related to a problem? Please describe.
Currenly Promtail can only read from files which are not compressed. At the same time applying compression is quite common for logging and holding a few days worth of logs on a machine.

  1. While it's quite common to have uncompressed log files for actively written files and then compress them on rotation there are certain loggers or shippers such as rsyslog that can even write a gzip stream right away doing inline compression. This is especially helpful on access logs or event logs which are high volume but also extremely compressible due to their very repetitive patterns or wasteful structure such as JSON. This approach also saves on disk IO as only a tiny fraction of the log is written to the disk.
  2. Another aspect is the issue of being unable to read and backfill on somewhat older log data that has already been compressed on the source machine. This could be due to Promtail not running or Loki being unavailable while logroation and compression happens. While there is a feature on logrotate to delay the compression for one iteration (usually used to allow shippers to finish slurping the file), this requires much more disk space on the machine (see reasoning on huge access logs above).

Describe the solution you'd like
Either an option within the scrape config to allow compressed files to be considered or just the ability for e.g. a gzipped file to be read transparently. It's the same data as an uncompressed file, all the other checks and limits just apply.

Describe alternatives you've considered
There really is no alternative. It's either natively supported by Promtail to read a log stream from compressed files or some manual action is required by operations to either decompress the files again or pipe them to Promtail somehow.

Additional context

@DylanGuedes
Copy link
Contributor

+1 looks like an important thing to support to make promtail more robust.

@ecliptik
Copy link

ecliptik commented May 6, 2022

We are also interested in this as enabling compression when using multi-AZ architectures in AWS has significant savings on Inter-AZ data transfer costs.

Anecdotal, but enabling compression in Pulsar messaging save 40% on inter-AZ data transfer. Having the ability to compression Promtail would have a similar impact.

@DylanGuedes
Copy link
Contributor

DylanGuedes commented Jun 10, 2022

Hey @frittentheke and @ecliptik, I was taking a look at this the last few days and I have a few questions for you:

  • Do you think it would be enough to decompress the file completely in a single pass instead of doing it inline (i.e: decompress a batch and parse it, then rotate to the next batch, decompress and parse it, and so on)?
  • Do you all see a problem in specifying the compressed algorithm/format instead of Promtail inferring it?

I'm planning on working on it this month but I'm still designing how it is going to work so these questions would help me decide a few things. Thank you in advance.

@stale
Copy link

stale bot commented Jul 10, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@DylanGuedes DylanGuedes added the keepalive An issue or PR that will be kept alive and never marked as stale. label Jul 11, 2022
@frittentheke
Copy link
Contributor Author

frittentheke commented Jul 11, 2022

Very sorry for missing to respond to your questions @DylanGuedes - thanks first of all for picking up on the issue!

Hey @frittentheke and @ecliptik, I was taking a look at this the last few days and I have a few questions for you:
* Do you think it would be enough to decompress the file completely in a single pass instead of doing it inline (i.e: decompress a batch and parse it, then rotate to the next batch, decompress and parse it, and so on)?

I am unsure if I understand you correctly.
First I would expect Promtail to gain the ability to read from a compressed file just like a "regular" file per this feature request. It's then just another stream of data fed into the parser / pipeline, just like from other files or sources. But a valid question is to weather and how the positions need to be tracked and saved for those files as well. I would argue: YES.

First to avoid reading such files again, but also reading from compressed files could be interrupted just as likely as an uncompressed file. Expecting a potentially huge file to be read and shipped successfully in one go and being unable to resume from an interruption seems unnecessary.
Last but not least it seems that the positions (https://github.com/grafana/loki/blob/main/clients/pkg/promtail/positions/positions.go) data structure might not even need any changes. It's more about being able to seek to a certain point in the compressed file upon using a position other than 0 and also to be able to actively tail a compressed file that is still appended to (logs are written in compressed format and not compressed a part of the log rotation).

* Do you all see a problem in specifying the compressed algorithm/format instead of Promtail inferring it?

Not really, but some thoughts on possible implementations of "auto-detection":

  1. Use the file extension (gz, bzip2, ...), I believe this is a clean interface to the operator and would work in conjunction with the file wildcards being e.g. "*.gz" to actively have Promtail look at compressed files.
  2. Use the magic bytes for supported formats to auto-detect the file type (like here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/extract-vmlinux).

In any case I would expect a helpful error message in case a format is not supported. Even with Golang supporting almost any sensible format, I believe it's quite seldom to have 7z used for logs.

I'm planning on working on it this month but I'm still designing how it is going to work so these questions would help me decide a few things. Thank you in advance.

Thanks again and sorry again for the late reply.

@DylanGuedes
Copy link
Contributor

Here's the thing: I think for most scenarios, users aren't appending more compressed data to a compressed file; what they normally want instead is to ingest compressed data a single time and that's it (similar to a batch job). Do you think this claim makes sense? If so, maybe we should:

  • Implement a new way of reading stuff in Promtail, I'd name it Batch ingestion or something similar
  • Implement a new client only for that. This would be the easiest solution.
    WDYT?

@frittentheke
Copy link
Contributor Author

frittentheke commented Jul 15, 2022

Here's the thing: I think for most scenarios, users aren't appending more compressed data to a compressed file; what they normally want instead is to ingest compressed data a single time and that's it (similar to a batch job). Do you think this claim makes sense? If so, maybe we should:

* Implement a new way of reading stuff in Promtail, I'd name it Batch ingestion or something similar

What would make this "batch" type of ingestion any different from slurping in "regular" log files?
If you only want this to happen once and not make it a regular config you can always call feed some older data into Promtail as like cat zcat logfile.log | promtail --stdint and as documented here: https://grafana.com/docs/loki/latest/clients/promtail/troubleshooting/#pipe-data-to-promtail

You can then also decompress on the fly before piping into Promtail - zcat logfile.log.gz| promtail --stdint or cat logfile.log.gz | gunzip - | promtail --stdin.

Promtail does it the UNIX way by being easy to combine with other tools via pipes and is more than versatile enough to be used in ad-hoc scripts doing any sort of batch imports.

So in short I do not believe there is a case for any new "way" of reading stuff.

* Implement a new client only for that. This would be the easiest solution.
  WDYT?

@DylanGuedes I honestly believe we lost track of what my actual intention was: "Giving Promtail the ability to read from files which are compressed".

This in essence is only about recognizing that a file is compressed and then to process the incoming data stream through a suitable library before the rest of the log parsing and shipping happens. In the case of Golang is likely would be https://pkg.go.dev/compress which can read compressed files (like zcat in my example above) and then feed it into the rest of Promtail, which likely needs no changes. The only thing I did suggest is to also track the progress for those files to not read and ship them again and allow for those files to have data appended.

So to form a list of what I believe needs to be implemented / done:

  • Allow Promtail to either recognize a file is compressed and then switch to a suitable reader (https://pkg.go.dev/compress) for this file or just use the file extension (.gz, .gzip, .bz2, .xz, ...) to switch readers.
  • Check if the existing progress tracking and file tailing (data is still being appended to) works transparently

@DylanGuedes
Copy link
Contributor

Thanks for the clarification, it makes way more sense now. Also in my previous message I overlooked the possibility of specifying folders that receive new compressed files.

@09jvilla
Copy link
Contributor

09jvilla commented Aug 2, 2022

@ecliptik can you explain a little bit more about your use case?

We are also interested in this as enabling compression when using multi-AZ architectures in AWS has significant savings on Inter-AZ data transfer costs. Anecdotal, but enabling compression in Pulsar messaging save 40% on inter-AZ data transfer. Having the ability to compression Promtail would have a similar impact.

Does this mean you have Promtail in 1 availability zone reading a file in another AZ and are therefore having to pay data transfer costs for that transaction? Is that right?

I guess I was just thinking that the Promtail is generally in the same AZ as the files it is reading, in which case the only data transfer cost is when promtail sends to Loki (at which point, we do already compress the data before sending).

DylanGuedes added a commit that referenced this issue Sep 27, 2022
**What this PR does / why we need it**:
Adds to Promtail the ability to read compressed files. It works by:
1. Infer which compression format to use based on the file extension
2. Uncompress the file with the native `golang/compress` packages
3. Iterate over uncompressed lines and send them to Loki

Its usage is the same as our current file tailing.

**Which issue(s) this PR fixes**:
Fixes #5956 

Co-authored-by: Danny Kopping <[email protected]>
lxwzy pushed a commit to lxwzy/loki that referenced this issue Nov 7, 2022
**What this PR does / why we need it**:
Adds to Promtail the ability to read compressed files. It works by:
1. Infer which compression format to use based on the file extension
2. Uncompress the file with the native `golang/compress` packages
3. Iterate over uncompressed lines and send them to Loki

Its usage is the same as our current file tailing.

**Which issue(s) this PR fixes**:
Fixes grafana#5956 

Co-authored-by: Danny Kopping <[email protected]>
changhyuni pushed a commit to changhyuni/loki that referenced this issue Nov 8, 2022
**What this PR does / why we need it**:
Adds to Promtail the ability to read compressed files. It works by:
1. Infer which compression format to use based on the file extension
2. Uncompress the file with the native `golang/compress` packages
3. Iterate over uncompressed lines and send them to Loki

Its usage is the same as our current file tailing.

**Which issue(s) this PR fixes**:
Fixes grafana#5956 

Co-authored-by: Danny Kopping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/promtail keepalive An issue or PR that will be kept alive and never marked as stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants