Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: missing log for offset processing due to non-existent old inodes in sqlite. #8025

Closed
jinyongchoi opened this issue Oct 11, 2023 · 8 comments · Fixed by #8062
Closed

Comments

@jinyongchoi
Copy link
Contributor

Bug Report

Describe the bug
When Fluent-bit is monitoring a file that gets deleted after it has been shutdown, and if the inode exists in the SQLite file, it may result in log loss with incorrect offsets from duplicate inodes.

So accurate offset processing is required by comparing the filename of the previous inode.

on EXT4

echo "1" > testfile; stat -c %i testfile; rm -f testfile; echo "1" > testfile; stat -c %i testfile
4223667
4223667

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] inode=4486486 with offset=2 appended as /root/testing/bulk2
  • Steps to reproduce the problem:
# clear test files
rm -f /root/testing/bulk1
rm -f /root/testing/bulk2
rm -f /tmp/input.db

# fluentbit start	
/opt/fluent-bit/bin/fluent-bit_2.1.10 -c /opt/fluent-bit/etc/fluent-bit/fluent-bit.conf

echo "1" >> /root/testing/bulk1; stat -c %i /root/testing/bulk1

[2023/10/11 09:51:53] [debug] [input:tail:tail.0] inode=4486486 file=/root/testing/bulk1 promote to TAIL_EVENT
[2023/10/11 09:51:53] [ info] [input:tail:tail.0] inotify_fs_add(): inode=4486486 watch_fd=1 name=/root/testing/bulk1
[2023/10/11 09:51:53] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2023/10/11 09:51:54] [debug] [task] created task=0x7f5d44a39e60 id=0 OK
[2023/10/11 09:51:54] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] tail.0: [[1697017913.231894845, {}], {"log"=>"1"}]

# fluentbit stop
# sqlite3 /tmp/input.db
SQLite version 3.7.17 2013-05-20 00:56:22
sqlite> select * from in_tail_files;
1|/root/testing/bulk1|2|4486486|1697017913|0

# file change
rm -f /root/testing/bulk1; echo "2" >> /root/testing/bulk2 ; echo "3" >> /root/testing/bulk2; stat -c %i /root/testing/bulk2

# fluentbit start
# missing "2" log
/opt/fluent-bit/bin/fluent-bit_2.1.10 -c /opt/fluent-bit/etc/fluent-bit/fluent-bit.conf
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] inotify watch fd=30
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] scanning path /root/testing/bulk*
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] inode=4486486 with offset=2 appended as /root/testing/bulk2
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] scan_glob add(): /root/testing/bulk2, inode 4486486
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] 1 new files found on path '/root/testing/bulk*'
[2023/10/11 09:53:11] [ info] [input:storage_backlog:storage_backlog.1] initializing
[2023/10/11 09:53:11] [ info] [input:storage_backlog:storage_backlog.1] storage_strategy='memory' (memory only)
[2023/10/11 09:53:11] [debug] [storage_backlog:storage_backlog.1] created event channels: read=32 write=33
[2023/10/11 09:53:11] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 4.8M
[2023/10/11 09:53:11] [debug] [stdout:stdout.0] created event channels: read=34 write=35
[2023/10/11 09:53:11] [debug] [router] match rule tail.0:stdout.0
[2023/10/11 09:53:11] [debug] [router] match rule storage_backlog.1:stdout.0
[2023/10/11 09:53:11] [ info] [sp] stream processor started
[2023/10/11 09:53:11] [debug] [input chunk] update output instances with new chunk size diff=4096, records=1, input=tail.0
[2023/10/11 09:53:11] [ info] [output:stdout:stdout.0] worker #0 started
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] [static files] processed 2b
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] inode=4486486 file=/root/testing/bulk2 promote to TAIL_EVENT
[2023/10/11 09:53:11] [ info] [input:tail:tail.0] inotify_fs_add(): inode=4486486 watch_fd=1 name=/root/testing/bulk2
[2023/10/11 09:53:11] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2023/10/11 09:53:11] [debug] [task] created task=0x7f8224639e60 id=0 OK
[2023/10/11 09:53:11] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] tail.0: [[1697017991.013734539, {}], {"log"=>"3"}]

# invalid file name
# size change. 2 => 4
sqlite> select * from in_tail_files;
1|/root/testing/bulk1|4|4486486|1697017913|0

Expected behavior
Accurate offset processing is required by comparing the filename of the previous inode.

Screenshots
None

Your Environment

  • Version used:
    • tested 2.1.10
  • Configuration:
[SERVICE]
    flush        1
    daemon       Off
    log_level    debug

    parsers_file parsers.conf
    plugins_file plugins.conf
    storage.metrics on
    storage.path /tmp/storage
    storage.sync normal
    storage.checksum off
    storage.backlog.mem_limit 5M

[INPUT]
    Name tail
    Path /root/testing/bulk*
    Exclude_Path *.gz,*.zip

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 10

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/input.db
    DB.sync normal
    DB.locking false

[OUTPUT]
    name  stdout
    match *
  • Environment name and version (e.g. Kubernetes? What version?):
    • docker images CentOS 7.9.2009
    • no k8s
  • Server type and version:
    • Intel i5-10400
    • file system : ssd ext4
  • Operating System and version:
    • base system : Ubuntu 22.04
  • Filters and plugins:
    • in_tail
      Additional context
@jinyongchoi
Copy link
Contributor Author

If there is currently no related PR, I will go ahead and submit a pull request for comparing file names in SQLite.

Please comment on whether I can proceed with the pull request.

@patrick-stephens
Copy link
Contributor

I recall some other issues/PRs on this @jinyongchoi - did you have a quick search? However always good to have a PR so please do.

@jinyongchoi
Copy link
Contributor Author

@patrick-stephens There seems to be the expected code in the last comment of a similar issue (#4895 ), but it appears that there is no PR yet. So i will try to submit a PR.

@RicardoAAD
Copy link
Collaborator

Hello @jinyongchoi,

Thanks for reporting this behavior; I reproduced it by following your instructions.
If you delete a file monitored by Fluent-bit using the tail plugin with the DB option enabled, you must also delete its entry on the DB to avoid this issue.

Example:

$ sqlite3 input.db
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> .tables
in_tail_files
sqlite> select * from in_tail_files;
1|bulk1.log|4|1807178|1697076574|0
sqlite> delete from in_tail_files where id =1 ;
 
sqlite> select * from in_tail_files;
sqlite>

Please if the workaround is not an option for you, could you please describe a production use case where you will be deleting files monitored by Fluent-Bit and what the impact of this issue will be for your organization?

Regards.

@jinyongchoi
Copy link
Contributor Author

@RicardoAAD Hi!

In the case of nginx's access.log, it's common to use 'cron scripts' or 'logrotate.'
Assuming you change access.log to access.log-YYYYMMDDHH, following the instructions in this link (https://www.nginx.com/resources/wiki/start/topics/examples/logrotation/),
when this file is rotated and Fluent-Bit is restarted due to modifications in the Fluent-Bit configuration or Lua script, it may forget the inode as shown in the example.
Therefore, manual verification is necessary.

The second scenario pertains to service logs.
Taking the logback module in Spring Boot as an example, it's common to use a format like APP-yyyymmddhh.log. Assuming the usage of such a format, since disk space is finite, log files will be compressed or deleted over time.
In such cases, if a restart is performed for the same reasons mentioned above, the same issue can occur.

While such occurrences are infrequent, when they do happen, the logs are not handled correctly.
Naturally, in certain situations, when the file size is small, it can be recognized as truncated logic and function correctly. Nevertheless, in our operational environment, we've observed cases where inodes from previous timestamps (created field) exist, signifying that this isn't always the situation.

So, to prevent these two scenarios, I plan to test and submit a pull request (PR).
The first change involves deleting items that are not being monitored by Fluent-Bit when it starts.
The second change involves considering the already running Fluent-Bit instances, as files with the same inode will not be deleted.
Therefore, the option being added includes logic to delete files with different file names.

If it doesn't conform to FluentBit's concept, please feel free to inform me.
Thanks!

jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 17, 2023
…uent#8025) (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 17, 2023
…uent#8025) (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 18, 2023
…uent#8025) (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 18, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 18, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 18, 2023
…luent#8025)(2/2)

(1/2)
When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 18, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 18, 2023
…luent#8025)(2/2)

(1/2)
When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
…luent#8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
…luent#8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
…luent#8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Oct 19, 2023
…luent#8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
@jinyongchoi
Copy link
Contributor Author

@patrick-stephens
This issue is now ready for review.

Today, for testing purposes, the modified Fluent Bit(2.1.10) was applied to one server(1.9.8) that had been running for approximately 5 months, and the following results were obtained in log.

grep stale td-agent-bit*.log
[2023/10/19 19:18:43] [ info] [input:tail:input_1] db: delete unmonitored stale inodes from the database: count=75
[2023/10/19 19:18:43] [ info] [input:tail:input_2] db: delete unmonitored stale inodes from the database: count=75

Thanks!

jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Dec 22, 2023
… (1/2)

To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Dec 22, 2023
…luent#8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
jinyongchoi added a commit to jinyongchoi/fluent-bit that referenced this issue Dec 22, 2023
…luent#8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Jan 18, 2024
@jinyongchoi
Copy link
Contributor Author

Ping!
waiting for merge approval.

edsiper pushed a commit that referenced this issue Apr 9, 2024
To prevent incorrect inode references,
FluentBit automatically removes unmanaged inodes during startup.

Signed-off-by: jinyong.choi <[email protected]>
edsiper pushed a commit that referenced this issue Apr 9, 2024
…8025)(2/2)

When checking the existence of a file's inode, if the 'compare_filename'
option is enabled, it is modified to compare the filename as well.
If the inode matches but the filename is different, it removes the stale
inode from the database.

Signed-off-by: jinyong.choi <[email protected]>
@lecaros lecaros added this to the Fluent Bit v3.0.2 milestone Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants