Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: clickhouse.JSON Serializer interface #1491

Merged
merged 7 commits into from
Feb 7, 2025
Merged

Conversation

SpencerTorres
Copy link
Member

@SpencerTorres SpencerTorres commented Feb 6, 2025

Summary

After running benchmarks (#1490), it's clear that the clickhouse.JSON type is the fastest way to append JSON data (excluding strings).

Right now the only way to convert a struct to this JSON data is to use reflection magic and walk the struct recursively. A user could also write their own functions to convert their struct to clickhouse.JSON. To make this optimization more apparent, I have added interfaces for clickhouse.JSONSerializer and clickhouse.JSONDeserializer.

If you have a custom struct you can implement these and the JSON column will make use of them when reading/writing data.
A helper function has also been included for easily reading these paths with a specific type in mind (clickhouse.ExtractJSONPathAs[T](jsonObj, path)). The user can also do this manually if they choose to.

Other changes:

  • Added test for json.Marshal to confirm it is using the NestedMap func
  • Reduced duplicated struct code for JSON tests

Example

Example:

type ProductPricing struct {
	Price    int64  `json:",omitempty"`
	Currency string `json:",omitempty"`
}

type Product struct {
	ID        clickhouse.Dynamic `json:"id"`
	Name      string             `json:"name"`
	Tags      []string           `json:"tags"`
	Pricing   ProductPricing     `json:"pricing"`
	Metadata  map[string]any     `json:"metadata"`
	CreatedAt time.Time          `json:"created_at" chType:"DateTime64(3)"`
}

// SerializeClickHouseJSONimplements clickhouse.JSONSerializer for faster struct appending
func (p *Product) SerializeClickHouseJSON() (*clickhouse.JSON, error) {
	obj := clickhouse.NewJSON()
	obj.SetValueAtPath("id", p.ID)
	obj.SetValueAtPath("name", p.Name)
	obj.SetValueAtPath("tags", p.Tags)
	obj.SetValueAtPath("pricing.price", p.Pricing.Price)
	obj.SetValueAtPath("pricing.currency", p.Pricing.Currency)
	obj.SetValueAtPath("metadata.region", p.Metadata["region"])
	obj.SetValueAtPath("metadata.page_count", p.Metadata["page_count"])
	obj.SetValueAtPath("created_at", p.CreatedAt)

	return obj, nil
}

// DeserializeClickHouseJSONimplements clickhouse.JSONDeserializer for faster struct scanning
func (p *Product) DeserializeClickHouseJSON(obj *clickhouse.JSON) error {
	p.ID, _ = clickhouse.ExtractJSONPathAs[clickhouse.Dynamic](obj, "id")
	p.Name, _ = clickhouse.ExtractJSONPathAs[string](obj, "name")
	p.Tags, _ = clickhouse.ExtractJSONPathAs[[]string](obj, "tags")
	p.Pricing.Price, _ = clickhouse.ExtractJSONPathAs[int64](obj, "pricing.price")
	p.Pricing.Currency, _ = clickhouse.ExtractJSONPathAs[string](obj, "pricing.currency")
	p.Metadata = make(map[string]any, 2)
	p.Metadata["region"], _ = clickhouse.ExtractJSONPathAs[string](obj, "metadata.region")
	p.Metadata["page_count"], _ = clickhouse.ExtractJSONPathAs[int64](obj, "metadata.page_count")
	p.CreatedAt, _ = clickhouse.ExtractJSONPathAs[time.Time](obj, "created_at")

	return nil
}

Then inside a batch append:

batch.Append(&product)

The underlying column implementation will then choose to use the user's clickhouse.JSON instead of building its own from reflection.

The same applies to Scan:

var row TestStruct
err := rows.Scan(&row)

The test struct will be populated using the implemented interface.

Performance

By having the user choose exactly how the object is serialized/deserialized, we can save on CPU and memory allocations:

Serialization:

BenchmarkJSONInsert/structs-32       	  275416	      3923 ns/op	    5421 B/op	      55 allocs/op
BenchmarkJSONInsert/fast_structs-32  	  468788	      2497 ns/op	    4968 B/op	      27 allocs/op

Deserialization:

BenchmarkJSONRead/structs-32                   	 2793265	       410.0 ns/op	     416 B/op	      15 allocs/op
BenchmarkJSONRead/fast_structs-32              	 3078796	       346.9 ns/op	    1040 B/op	       8 allocs/op

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added

Copy link
Contributor

@jkaflik jkaflik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🚀

lib/chcol/json.go Outdated Show resolved Hide resolved
examples/clickhouse_api/main_test.go Show resolved Hide resolved
@SpencerTorres SpencerTorres merged commit 8d87c23 into main Feb 7, 2025
12 checks passed
@SpencerTorres SpencerTorres deleted the json_serializer branch February 7, 2025 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants