Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set offset of files under ignore_older to file.size() #2907

Merged
merged 2 commits into from
Nov 3, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ https://github.com/elastic/beats/compare/v5.0.0...master[Check the HEAD diff]
*Topbeat*

*Filebeat*
- If a file is falling under ignore_older during startup, offset is now set to end of file instead of 0.
With the previous logic the whole file was sent in case a line was added and it was inconsitent with
files which were harvested previously. {pull}2907[2907]

*Winlogbeat*

Expand Down
8 changes: 4 additions & 4 deletions filebeat/docs/faq.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Filebeat might be incorrectly configured or unable to send events to the output.

* Make sure the config file specifies the correct path to the file that you are collecting. See <<filebeat-configuration>>
for more information.
* Verify that the file is not older than the value specified by <<ignore-older,`ignore_older`>>. By default, Filebeat
stops reading files that are older than 24 hours. You can change this behavior by specifying a different value for
* Verify that the file is not older than the value specified by <<ignore-older,`ignore_older`>>. ignore_older is disable by
default so this depends on the value you have set. You can change this behavior by specifying a different value for
<<ignore-older,`ignore_older`>>.
* Make sure that Filebeat is able to send events to the configured output. Run Filebeat in debug mode to determine whether
it's publishing events successfully:
Expand All @@ -47,7 +47,7 @@ There are additional configuration options that you can use to close file handle

The `close_renamed` and `close_removed` options can be useful on Windows to resolve issues related to file rotation. See <<windows-file-rotation>>. The `close_eof` option can be useful in environments with a large number of files that have only very few entries. The `close_timeout` option is useful in environments where closing file handlers is more important than sending all log lines. For more details, see <<configuration-filebeat-options>>.

Make sure that you read the documentation for these configuration options before using any of them.
Make sure that you read the documentation for these configuration options before using any of them.

[float]
[[reduce-registry-size]]
Expand Down Expand Up @@ -112,4 +112,4 @@ harvested, a newline character is required after the last line, or Filebeat will
the file.

include::../../libbeat/docs/faq-limit-bandwidth.asciidoc[]
include::../../libbeat/docs/shared-faq.asciidoc[]
include::../../libbeat/docs/shared-faq.asciidoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ The files affected by this setting fall into two categories:
* Files that were never harvested
* Files that were harvested but weren't updated for longer than `ignore_older`

When a file that has never been harvested is updated, the reading starts from the beginning as the state of the file was created with the offset 0. For a file that has been harvested previously, reading continues at the last position.
For files which were never seen before, the offset state is set to the end of the file. If a state already exist, the offset is not changed. In case a file is updated again later, reading continues at the set offset position.

The `ignore_older` setting relies on the modification time of the file to determine if a file is ignored. If the modification time of the file is not updated when lines are written to a file (which can happen on Windows), the `ignore_older` setting may cause Filebeat to ignore files even though content was added at a later time.

Expand Down
4 changes: 4 additions & 0 deletions filebeat/prospector/prospector_log.go
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,10 @@ func (p *ProspectorLog) handleIgnoreOlder(lastState, newState file.State) error
return nil
}

// Set offset to end of file to be consistent with files which were harvested before
// See https://github.com/elastic/beats/pull/2907
newState.Offset = newState.Fileinfo.Size()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining why we set offset to filesize + why is ignored file added to registry. See PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation note added. For all details I referenced to this PR.


// Write state for ignore_older file as none exists yet
newState.Finished = true
err := p.Prospector.updateState(input.NewEvent(newState))
Expand Down
4 changes: 2 additions & 2 deletions filebeat/tests/system/test_registrar.py
Original file line number Diff line number Diff line change
Expand Up @@ -1328,8 +1328,8 @@ def test_ignore_older_state(self):
data = self.get_registry()
assert len(data) == 1

# Check that offset is 0 even though there is content in it
assert data[0]["offset"] == 0
# Check that offset is set to the end of the file
assert data[0]["offset"] == os.path.getsize(testfile1)

def test_ignore_older_state_clean_inactive(self):
"""
Expand Down