-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest unstructured json records or capture unrecognized fields #12207
Comments
Will be tracked on User-requested issues (Notion) |
IIUC, this can be done by defining a generated column accessing that JSONB. |
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned. |
FYI, this is somehow similar to #[derive(Serialize, Deserialize)]
struct S {
a: u32,
b: String,
#[serde(flatten)]
other: Map<String, Value>,
} from serde-rs/serde#941 (comment) We need to find a way to mark the column. |
jsonb
column
We are enhancing the schemaless features recently. Shall we work out this out based on CREATE TABLE t1 (
-- can be empty
)
INCLUDE PAYLOAD AS payload JSONB
WITH ( connector = 'kafka',
topic = 'test_include_key')
FORMAT PLAIN ENCODE JSON where the |
Related: #17959
|
Yeah, this leverages |
#17650 (comment) proposed another variant of syntax
|
I prefer this approach because the new collecting column has to be JSON type. |
I still vote for Conversely, if we choose |
From the impl side, this approach is more doable.
In the original design, |
In a discussion with @fuyufjh that is inspired by a user's question
When a user has some rows in JSON format, he would like to:
Right now, when (1), if a JSON field is in the row data but is not defined in the table schema, it will not be parsed and ingested into Risingwave.
For (2), right now, the user has to wrap the entire row by another field in JSON, e.g.
'data': {the original data}
. Therefore, from time to time, it requires the user to do another transformation before ingesting data into Risingwave. However, the user may not have control of the data format in the source as the source data is collected by some other data team.By enabling users to do so, they can do more ETL workload all in RW instead of bringing up another system.
If we give the option that users can group all the JSON fields undefined in the table schema as one field, (2) is naturally solved.
Also, the user often wants to take the
primary key
out as a single column to be the primary key to duplicate the source stream of the table but keep everything else in a hugedata
JSONB column.Welcome more observations and counter-examples
The text was updated successfully, but these errors were encountered: