Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Input can't handle json data #7481

Closed
JackJiaJJ opened this issue May 26, 2023 · 10 comments · Fixed by #7492
Closed

Kafka Input can't handle json data #7481

JackJiaJJ opened this issue May 26, 2023 · 10 comments · Fixed by #7492
Labels

Comments

@JackJiaJJ
Copy link

Bug Report

Describe the bug

@tarruda we just built an image with latest code (it includes kafka input), it can work well with plain text, but can't work well with json, please see below:

I defined Input Kafka, Output cloudwatch log group and stdout below is the configuration:

  inputs: |
    [INPUT]
        Name        kafka
        Brokers     $myKafkaServers
        Topics      $myKafkaTopic

        rdkafka.debug All
        rdkafka.enable.ssl.certificate.verification true
        rdkafka.ssl.certificate.location /certs/cert.pem
        rdkafka.ssl.key.location /certs/cert.key
        rdkafka.security.protocol ssl
        rdkafka.auto.offset.reset earliest
        rdkafka.enable.auto.commit false
        rdkafka.log_level 7

  outputs: |
    [OUTPUT]
        Name        cloudwatch_logs
        Match       *
        region us-east-1
        log_group_name /aws/clientLogTestGroup
        log_format json/emf
        log_stream_prefix fluent-bit-from-kafka-for-apic-
    [OUTPUT]
        Name stdout
        Match *

  1. When I produce data to my kafkaTopic with plain text, I can get it from aws log group

$ ./kafka-console-producer.sh --broker-list $myKafkaServer --producer.config client.properties --topic $myKafkaTopic
>test only

I can get the text in aws log group:
image

if I produce json data, like below:
>"{\"name\":\"jiajj\"}"

or
>{"name":"jack"}

It can't work, from log group, you can see the payload doesn't exist:

image

Would you please give some advice about it ? thanks !

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
{"log":"YOUR LOG MESSAGE HERE","stream":"stdout","time":"2018-06-11T14:37:30.681701731Z"}
  • Steps to reproduce the problem:

Expected behavior

Screenshots

Your Environment

  • Version used:
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version:
  • Operating System and version:
  • Filters and plugins:

Additional context

@tarruda
Copy link

tarruda commented May 26, 2023

I've investigated and found this to be the commit which introduced the bug. It seems the system was changed to use the log event abstraction but didn't handle json conversion.

@edsiper @Syn3rman is there a function like flb_pack_json but which converts JSON to the new log event abstraction format?

@tarruda
Copy link

tarruda commented May 26, 2023

@JackJiaJJ I'm working on a fix, but I will also switch the implementation to use a configuration key for deciding if JSON should be parsed.

What should be the default in your opinion? Is JSON the most common data format for kafka? If so I could set the parse behavior as being the default.

@CherryJia
Copy link

@tarruda From my option, most of the projects using JSON as a data exchange format (not only with Kafka), so we prefer JSON as the default. @JackJiaJJ FYI

@stephanmiehe
Copy link

I would second the above and with the adoption of Open Telemetry I'd imagine many companies are moving to adopt JSON

@beleam
Copy link

beleam commented May 27, 2023

Temporary workaround:

With this setup, you can get the json parse behaviour by setting up a filter with a json parser https://docs.fluentbit.io/manual/pipeline/filters/parser

I actually needed avro parsing and spent extra effort to produce json, so would prefer customizable parsing vs default json. Would also find arrow processing useful.

@leonardo-albertovich
Copy link
Collaborator

It seems like the issue with that PR is that he should've used flb_log_event_encoder_append_body_raw_msgpack instead of flb_log_event_encoder_append_body_binary.

@tarruda
Copy link

tarruda commented May 29, 2023

Created #7492 to fix the issue (thanks @leonardo-albertovich for the quick parsing fix). Also added data_format configuration option. For now, only none and json are supported (none is the default).

@JackJiaJJ can you check that branch and LMK if all is good?

@martinsagan
Copy link

Hello, is any news about this fix? When would be available?
How can I set up the variable data_format? It will be new parameter?
Thanks.

@patrick-stephens
Copy link
Contributor

patrick-stephens commented Jun 23, 2023

Hello, is any news about this fix? When would be available?

Once it is merged it will be in the next nightly build if you want to try it (these are not for production obviously): https://github.com/fluent/fluent-bit/tree/master/.github/workflows#unstablenightly-builds
You can also build from the PR #7492 and the PR should provide a specific PR image to use as well (once it is built for integraion tests): https://github.com/fluent/fluent-bit/blob/master/dockerfiles/README.md#ghcrio-topology

It will then be in the next 2.1 series release after merging.

How can I set up the variable data_format? It will be new parameter?

Please look at the PR, it includes an example from @tarruda.

@MohamedEHJ
Copy link

Hi, is there an estimated release date that includes a solution for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants