Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: avoid locking files on Windows #1159

Merged
merged 1 commit into from
Mar 25, 2019

Conversation

fujimotos
Copy link
Member

Previously, flb_tail_file_append() used POSIX open(2) to open log files.
The problem with this approach was that it silently acquires a delete
lock on the file, and thus can interfere with file maintenance operations
performed by the logging process (like log file rotation).

PS > rm access.log
rm : cannot remove item access.log: The access cannot access the
file 'access.log' because it is being used by another process.

This patch resolves the issue by using CreateFile() instead of POSIX open()
and enabling resource sharing (FILE_SHARE_*) explicitly.

This patch is intended to be merged after v1.1 release

Part of #960

Previously, flb_tail_file_append() used POSIX open(2) to open log files.
The problem with this approach was that it silently acquires a delete
lock on the file, and thus can interfere with file maintenance operations
performed by the logging process (like log file rotation).

    PS > rm access.log
    rm : cannot remove item access.log: The access cannot access the
    file 'access.log' because it is being used by another process.

This patch resolves the issue by using CreateFile() instead of POSIX open()
and enabling resource sharing (FILE_SHARE_*) explicitly.

Signed-off-by: Fujimoto Seiji <[email protected]>
@edsiper edsiper merged commit 031cd5c into fluent:master Mar 25, 2019
@fujimotos fujimotos deleted the sf/tail-posix-open branch December 6, 2019 07:03
@djsly
Copy link

djsly commented Mar 12, 2020

@fujimotos we tried the latest rc release and we are getting constant file locking when using the fluent-bit windows on Kubernetes.

When a windows pod gets deleted, the pod stays in Terminating State for ever until we delete the fluent bit pod running on the same node.

The error in Kubelet shows that the log file associated to the deleted pod cannot be found / or is unaccessible.

@fujimotos
Copy link
Member Author

When a windows pod gets deleted, the pod stays in Terminating State for ever until we delete the fluent bit pod running on the same node.

@djsly That's weird.

This patch is for in_tail, so that the plugin won't acquire exclusive lock on log files
that it is reading (This basically allows other processes to modify the log file while
Fluent Bit has an open handler on it).

It sounds to me that your issue is not directly related to this patch, but due to the
other bugs.

Can you provide the following information so that I investigate the issue a bit more?

  • FluentBit's configurationfor the pod
  • FluentBit's log file for the pod.
  • (More information on this issue if you have any)

@djsly
Copy link

djsly commented Mar 12, 2020

@fujimotos sure, I will let @titilambert create a new issue with the latest config. we can reproduce quite easily, so that's a good thing

@titilambert
Copy link

Hello @fujimotos

Sorry for the delay,

I'm using the latest windows version from the PR thread.

Here the configuration:

fluent-bit.conf 
[SERVICE] 
        Flush        5s
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf
        Plugins_File plugins.conf
        HTTP_Server  On
        HTTP_Listen  0.0.0.0
        HTTP_Port    2020
        
[INPUT] 
        Name             tail
        Path             C:\k\kubelet.err.log
        Parser           raw
        Path_Key         kubelet_err_log
        DB               C:\tail2.db
        Tag              host.* 
        Refresh_Interval 60
        Rotate_Wait      5
        Skip_Long_Lines  On
        DB.Sync          Normal

[FILTER]
        Name modify
        Match host.*
        Add _HOSTNAME ${NODE_NAME}

[INPUT]
        Name             tail
        Path             C:\var\log\containers\*
        Parser           docker
        DB               C:\tail.db
        Tag              containers.*
        Refresh_Interval 60
        Rotate_Wait      5
        Skip_Long_Lines  On 
        DB.Sync          Normal

[FILTER]
        Name             kubernetes
        Match            containers.*
        tls.verify       On
        Kube_URL         https://kubernetes.default.svc.cluster.local:443
        Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix  kube.var.log.containers.
        Merge_Log        On
        Merge_Log_Key    log_processed

[OUTPUT]
        Name forward
        Match *
        Host receiver.svc.cluster.local
        Port 24321
        tls off
        tls.verify on

Here the logs:

Fluent Bit v1.4.0
Copyright (C) Treasure Data

[2020/03/16 15:36:10] [ info] [storage] initializing...
[2020/03/16 15:36:10] [ info] [storage] in-memory
[2020/03/16 15:36:10] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/03/16 15:36:10] [ info] [engine] started (pid=7824)
[2020/03/16 15:36:10] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2020/03/16 15:36:10] [ info] [filter_kube] local POD info OK
[2020/03/16 15:36:10] [ info] [filter_kube] testing connectivity with API server...
[2020/03/16 15:36:10] [ info] [filter_kube] API server connectivity OK
[2020/03/16 15:36:10] [ info] [sp] stream processor started

When I delete a windows pod, the pods stays in Terminating

telegraf-windows-node-hj76f                               0/1     Terminating   0          5h30m

When I delete the fluentbit windows pod, the telegraf windows stops correctly

Thanks !

@titilambert
Copy link

@fujimotos do you prefer a new issue for that ?

@fujimotos
Copy link
Member Author

@titilambert Thank you. I created a new issue #2027 to track this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants