Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential narrowing conversion bug in timeMicrosFromNative for time-micros logical type #242

Closed
mihaitodor opened this issue Mar 25, 2022 · 0 comments · Fixed by #243
Closed

Comments

@mihaitodor
Copy link
Collaborator

Given the following schema.avsc

{
  "type": "record",
  "name": "LongList2",
  "fields": [
    {
      "default": null,
      "name": "long_time_micros",
      "type": [
        "null",
        {
          "type": "long",
          "logicalType": "time-micros"
        }
      ]
    }
  ]
}

and the following JSON message:

{"long_time_micros":{"long":20192000000000}}

The following Main.java program

// > java --version
// openjdk 17 2021-09-14
// OpenJDK Runtime Environment Homebrew (build 17+0)
// OpenJDK 64-Bit Server VM Homebrew (build 17+0, mixed mode, sharing)
// > java -cp avro_1.11/avro-1.11.0.jar:avro_1.11/jackson-core-2.12.5.jar:avro_1.11/jackson-annotations-2.12.5.jar:avro_1.11/jackson-databind-2.12.5.jar:avro_1.11/slf4j-api-1.7.32.jar Main.java

import java.math.BigInteger;
import java.util.Arrays;
import java.io.File;
import org.apache.avro.Schema;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.specific.SpecificDatumReader;


public class Main {
    static String schemaJSON = """
{
	"type": "record",
	"name": "LongList2",
	"fields": [
		{
			"default": null,
			"name": "long_time_micros",
			"type": [
				"null",
				{
					"type": "long",
					"logicalType": "time-micros"
				}
			]
		}
	]
}
""";

    public static void main(String[] args) {
        // message = `{"long_time_micros":{"long":20192000000000}}`
        String messageHex = "028080e68faa9709";
        byte[] messageBytes = new BigInteger(messageHex, 16).toByteArray();

        System.out.println(Arrays.toString(messageBytes));

        try {
            Schema schema = new Schema.Parser().parse(schemaJSON);
            DatumReader<GenericRecord> reader = new SpecificDatumReader<GenericRecord>(schema);
            Decoder decoder = DecoderFactory.get().binaryDecoder(messageBytes, null);
            GenericRecord payload = reader.read(null, decoder);
            System.out.println(payload);
        }
        catch (Exception e) {
            System.out.println(e);
        }
    }
}

outputs {"long_time_micros": 20192000000000}, while the following main.go program

// > go version
// go version go1.17.8 darwin/amd64

package main

import (
	"fmt"
	"log"

	"github.com/linkedin/goavro/v2"
)

var (
	schemaJSON = `{
	"type": "record",
	"name": "LongList2",
	"fields": [
		{
			"default": null,
			"name": "long_time_micros",
			"type": [
				"null",
				{
					"type": "long",
					"logicalType": "time-micros"
				}
			]
		}
	]
}`

	// message = `{"long_time_micros":{"long":20192000000000}}`
	avroMessage = "\x02\x80\x80\x97\t"
)

func main() {
	var codec *goavro.Codec
	var err error
	if codec, err = goavro.NewCodecForStandardJSON(schemaJSON); err != nil {
		log.Fatalf("failed to parse response for schema: %v", err)
	}

	native, _, err := codec.NativeFromBinary([]byte(avroMessage))
	if err != nil {
		log.Fatalf("failed to call NativeFromBinary: %v", err)
	}

	jb, err := codec.TextualFromNative(nil, native)
	if err != nil {
		log.Fatalf("failed to call TextualFromNative: %v", err)
	}

	fmt.Println(string(jb))
}

outputs {"long_time_micros":{"long.time-micros":1358741504}} with github.com/linkedin/goavro/v2 v2.11.0.

It looks like this is caused by a narrowing conversion here where duration should probably be of type int64, in which case the cast to int32 should be removed. I'm not knowledgeable enough to read the Java code in detail to see if this is the type they use, but I believe the implementation starts here and the tests for it are here.

Note that my go code produces the same output as Java if I remove the int32 cast in timeMicrosFromNative for the time.Duration case. The tests in this package aren't checking this particular edge case.

I'm guessing this is a bug, but please let me know what you think.

mihaitodor added a commit to mihaitodor/goavro that referenced this issue Mar 25, 2022
Looks like this issue comes from a regression introduced in linkedin#213.

Fixes linkedin#242.
mihaitodor added a commit to mihaitodor/connect that referenced this issue Apr 5, 2022
When dealing with logical types, we can't serialise the native
struct emitted by goavro directly to JSON, since that will discard
schema information that `codec.TextualFromNative` uses to produce
the expected JSON.

Also, the tip version of goavro is required because the fix for
this regression linkedin/goavro#242 was
merged, but there isn't a tagged release yet.

Fixes redpanda-data#1161.
mihaitodor added a commit to mihaitodor/connect that referenced this issue Apr 5, 2022
When dealing with logical types, we can't serialise the native
struct emitted by goavro directly to JSON, since that will discard
schema information that `codec.TextualFromNative` uses to produce
the expected JSON.

Also, the tip version of goavro is required because the fix for
this regression linkedin/goavro#242 was
merged, but there isn't a tagged release yet.

Fixes redpanda-data#1161.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant