Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat doesn't remove deleted files if new File with same name is written #3789

Closed
Shaoranlaos opened this issue Mar 22, 2017 · 9 comments
Closed
Labels
Filebeat Filebeat

Comments

@Shaoranlaos
Copy link

Version: 5.2.2
OS: CentOS 7

Due to an old OS on another machine, under which filebeat doesn't run, i rsync the logs to my elastic server on which a filebeat reads them.

Through this setup i found a bug in the clean_removed Routine, which creates a situation, in combination with the inode reuse under linux, in which there are two entrys in the registry file for the same source path with different inodes.

{ "source":"/root/copied_logs/<path to the same log>", "offset":5243941, "FileStateOS":{ "inode":1433391, "device":2049 }, "timestamp":"2017-03-22T12:59:31.413988399+01:00", "ttl":-1 }, { "source":"/root/copied_logs/<path to the same log>", "offset":5246116, "FileStateOS":{ "inode":1433400, "device":2049 }, "timestamp":"2017-03-22T13:02:32.749620477+01:00", "ttl":-1 }

The second entry is the one that doesn't exist anymore and when the new log file is rsynced it gets the inode from this entry and filebeat trys to continue reading it.

As a solution for this Bug:
The cleanup method should not only check if it can get os.Stat for the source but also if this stat is the same as the one in the registry entry which is searched. Otherwise it should be deleted.

_, err := os.Stat(state.Source)

@ruflin
Copy link
Contributor

ruflin commented Mar 24, 2017

I can see the issue, that the state will not be removed. How does your rsync exactly work? Is the file first removed and later the new one added, or does that happen in one go? Do you have a predefined rsync cycle? If yes, you could use clean_inactive to probably solve your issue.

The part that is not clear to me is why filebeat should continue to read the file? It loads states based on inode/device and not file path.

@monicasarbu monicasarbu added the Filebeat Filebeat label Mar 24, 2017
@Shaoranlaos
Copy link
Author

The rsync syncronises a complete log folder wit the -a option active (so that all timestamps are preserved) every 10 minutes.
The problamatic point is when rsync deletes a out rotated file(e.g. log.9) and renames the file(log.8) to this name(log.9). Through this the filename still exists and the entry in the registry will not be deleted, even if the inode for this deleted file is no longer in use, so that 2 entrys exists simultaniously for the same source path (as shown above).

After the next logrotation and sync phase linux will give the newly created log(log.1) the inode number of the deleted log.9 (through inode reuse) and filebeat will think it is a rename and not a create.

To your questions explicitly:

  • The file is first removed and in a later rsync phase the new one is added.
  • The clean_inactive option is a workaround yes, but only as long as the log rotation happens in a predefiend period and not size based(than only if the grows rate is roughly stabil)
  • Filebeat will continue to read the file because the newly added file has the inode of the old deleted file and this inode has an entry in the registry because it was not deleted because the sourcepath of this entry still exists

@ruflin
Copy link
Contributor

ruflin commented Mar 28, 2017

It is currently by design, that in case of "doubt" states are not removed as normally it is worse to remove one state too many then one too few. But in your inode reuse case this leads to issues.

I'm trying to think trough what the negative / side affects could be if we not only compare the path but also the inode/device and if it does not match, we remove the state.

If a file is renamed after the reading has finished and clean_removed gets into play, the state will be removed even though the file is still around. This should happen rarely as the prospector also keeps track of renamed files and it should not matter, as this file already falls under ignore_older.

Other cases?

@ruflin
Copy link
Contributor

ruflin commented Mar 28, 2017

I created #3827 to run some tests on this change. It affects some test with heavy file rotation. It can definitively be discussed if the new outcome is the one expected.

The most important thing I found is that our docs state:

This setting does not apply to renamed files or files that were moved to another directory that is still visible to Filebeat.

That would be the ideal case for me but as we only compare the source and can't ask the disk to check if an inode/device is still around, I think the above statement is not fully correct :-(

Also the changed behaviour is not perfect in my opinion as it still relies on the file path. The only place where we know that an file handler was removed is in the harvester. Currently the harvester does not report back why it was closed. We could potentially change that in the future that the harvester reports back that a file was deleted and updates the state directly.

@Shaoranlaos
Copy link
Author

Yes but the back reporting would only work if the harvester is still running and not already closed. Doesn't it?

To the possible side affects: i can't think of possible side affects this would have and that would be contrary to the intended function of this feature.
Is the case that you listed realy possibly? I think i read somewhere in the code(but i can't find it in the moment) that between the renaming and the removing a complete scan cycle is.

@ruflin
Copy link
Contributor

ruflin commented Mar 30, 2017

The prospector also reports on file changes like renaming. It happens on each scan.

There are some side affects visible in the failing tests above. But TBH it's not a "real" use case as we use scan_frequency: 0.1s which should never be used in production but it shows some interesting side affects of things and also shows if their are some race conditions.

ruflin added a commit to ruflin/beats that referenced this issue Apr 13, 2017
Previously if a file could be found under the same name as a state, the state was not removed. But this could have been also an other file. With this change also the file itself is compared and if it is not the same file, the state will be removed. This has the affect that in case a file is renamed after monitoring the file finished, the state will be removed. In most cases this should not have any side affect.

The positive effect of this change is that there will be less left over states in the registry.

Closes elastic#3789
@7AC 7AC closed this as completed in #3827 Apr 18, 2017
7AC pushed a commit that referenced this issue Apr 18, 2017
* Modify clean_removed handling

Previously if a file could be found under the same name as a state, the state was not removed. But this could have been also an other file. With this change also the file itself is compared and if it is not the same file, the state will be removed. This has the affect that in case a file is renamed after monitoring the file finished, the state will be removed. In most cases this should not have any side affect.

The positive effect of this change is that there will be less left over states in the registry.

Closes #3789

* implement review changes
@ruflin
Copy link
Contributor

ruflin commented Apr 18, 2017

@Shaoranlaos A change of behaviour was just merged into master: #3827 This will be available in the snapshot builds in the next hours if you are interested to test it: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

@Shaoranlaos
Copy link
Author

@ruflin After a short Test this seems to work as intended.

However, I noticed another problem:
With my rsync structure there was during the test the case where an already existing file got a new inode number during the sync. but this inode belonged before the sync to the rotated file so that filebeat will continue reading the updated existing file on the false offset. This happens only because rsync doesn't handle the renaming of the rotated file correctly.

rename_error

However this is an error specific to my very special use case and i have in the moment a working workaround with the simultaneously sync of only the rotating files that belong together. So there is no need to work on a (in my opinion) complicated solution for this.

@ruflin
Copy link
Contributor

ruflin commented Apr 24, 2017

@Shaoranlaos Thanks for the testing, appreciate it. Agree that the new issue is more a rsync issue then fb issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat
Projects
None yet
Development

No branches or pull requests

3 participants