-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat utf-8 encoding doesn’t honor BOM #1349
Comments
Same error with WinLogBeats: C:\communitor\winlogbeat>winlogbeat.exe -configtest This, ironically, makes you unable to edit the Windows log beat ON Windows. |
@StianOvrevage Not sure if the two errors are related. The above problem is about the content of the log files. Your issue is related to the config files. Which editor did you use to modify the config file? |
I'm using the default (Notepad) editor which every Windows admin uses ;) I used Notepad++ to remove the BOM but it's not an ideal solution. |
But yes, I didn't realize that this issue was about reading the actual log files, I just assumed it was the same error I hit upon. But the problem is largely the same. A leading BOM will prevent Winlogbeat from starting. |
@StianOvrevage Thanks for bringing up this issue. As these two are not directly related, could you open a separate Github issue for further discussion? |
@prehor Could you provide an example log file (only a few lines) for download just to make sure we test exact the same file? The above mentioned is an issue and we will try to find ways how we can fix it. Unfortunately it complexes our reading logic quite a bit. |
@ruflin There is beginning of Exchange Message Tracking file with UTF-8 BOM. |
Reading a file with a bom included the bom with the first event. This change removes the bom part from the first event in case it exists. * Tests for utf-8 and utf-16 added Closes elastic#1349
Reading a file with a bom included the bom with the first event. This change removes the bom part from the first event in case it exists. * Tests for utf-8 and utf-16 added Closes #1349
Hi, I think am getting the same issue when trying to parse JSON log files with filebeat 6.2.4.
Is this the same issue? |
prospector configuration allows me to select This only occurs with the first line in each log file, but the rest of the lines are correctly parsed |
@dannygoulder Looks oddly similar, looking at the code and tests cases we should strip the bom, beats/filebeat/input/log/harvester.go Lines 260 to 264 in 1789ef9
I would check with an hex editor to see that character is at the beginning and I will also check the output of This is an hexdump of utf-8 file with a bom.
output of the file command
|
Hi, thanks for the reply. I thought for a moment that I was getting a strange-looking BOM, but then I realised that hexdump was swapping the bytes with the default
Any idea what I can do next? |
Oh, and in case it matters, the file is coming from a Windows 2016 system, upon which filebeat is running. Obviously the hexdump and file commands are being run against the file after copying it to a Linux system. :) |
For the hexdump, I presume we don't have the same defaults, os x vs linux.
I am not sure what is going on here. @dannygoulder lets create a new issue, lets make sure we include the Filebeat version and could you create a small file that we can use to reproduce it? |
I'm also seeing issues on Windows Server 2019 where filebeat fails to parse the first line of a JSON log file when the file format (as indicated by Notepad++) is Using Filebeat 6.4.3, this is logged:
|
Please check for "UCS2 LE BOM" (Notepad++), or little endian BOM, the first bytes are: FF FE in my hex editor, this was found in SQL Server 2019 Logs. Error in filebeat: Thanks in advance |
Windows often prepend UTF-8 BOM to text files, which is legal - see Can a UTF-8 data stream contain the BOM character (in UTF-8 form)?.. In my case it happens in Exchange Message Tracking logs.
Filebeat should strip BOM from files with UTF-8 encoding which it doesn't do and BOM appears in message field for the first line in a file:
Filebeat config:
Hex dump of first line in file MSGTRK20160405-1.LOG:
First three bytes EF BB BF (UTF-8 encoded BOM) are decoded to unicode character FE FF and it appears in message field.
I'm using filebeat-1.2.0-darwin.
The text was updated successfully, but these errors were encountered: