-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(codec): Add csv encoding for sinks #16603
Conversation
✅ Deploy Preview for vector-project ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
✅ Deploy Preview for vrl-playground ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
Regression Detector ResultsRun ID: b3160919-f52c-4e5b-af18-c3e3473be923 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Fine details of change detection per experiment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a UX perspective this seems to be in line, as it's implemented as a codec.
But I'll defer final approval from others in the @vectordotdev/ux-team.
Flagged some cosmetic details
lib/codecs/src/encoding/mod.rs
Outdated
@@ -387,6 +406,8 @@ impl SerializerConfig { | |||
pub enum Serializer { | |||
/// Uses an `AvroSerializer` for serialization. | |||
Avro(AvroSerializer), | |||
/// Uses an `CsvSerializer` for serialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Uses an `CsvSerializer` for serialization. | |
/// Uses a `CsvSerializer` for serialization. |
lib/codecs/src/encoding/mod.rs
Outdated
@@ -197,6 +197,15 @@ pub enum SerializerConfig { | |||
avro: AvroSerializerOptions, | |||
}, | |||
|
|||
/// Encodes an event as an CSV message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Encodes an event as an CSV message. | |
/// Encodes an event as a CSV message. |
} | ||
} | ||
|
||
/// The data type of events that are accepted by `JsonSerializer`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The data type of events that are accepted by `JsonSerializer`. | |
/// The data type of events that are accepted by `CsvSerializer`. |
/// Build the `CsvSerializer` from this configuration. | ||
pub fn build(&self) -> Result<CsvSerializer, BuildError> { | ||
if self.fields.is_empty() { | ||
Err("At least one csv field must be specified".into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err("At least one csv field must be specified".into()) | |
Err("At least one CSV field must be specified".into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an undocumented/untested requirement here that fields are encoded in a consistent order as well. I believe that should be happening with the existing implementation due to the Vec<OwnedValuePath>
but it would be wise to ensure that behavior doesn't change going forward.
#[derive(Debug, Clone, Default)] | ||
pub struct CsvSerializerConfig { | ||
/// The CSV fields. | ||
pub fields: Vec<OwnedValuePath>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wholly convinced by this configuration option, we have encoding.only_fields
configuration option for Sinks today which covers much of the same purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also considered this config. But in order to use only_config
from the transformer, we need to build the serializer with transformer as an argument. That seems a little confusing, and would also introduce a circular import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this config deeper in csv
context, like encoding.csv.fields
, maybe that seems better?
Hi, I've added a test for field sequence, and some docs for this configuration. |
Regression Detector ResultsRun ID: 7d3baf04-670c-41a3-b984-eb7254275d0a ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: 3be9e8e4-92ba-4aad-88ab-47736aea1e54 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
/// | ||
/// Some value types: Array, Regex, Object are not supported by the CSV format. | ||
/// If a field is of an unsupported type, the output will be `NaN`. | ||
pub fields: Vec<ConfigOwnedValuePath>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure the choice of ValuePath
here was intentional, instead of TargetPath
. This will prevent event metadata from being serialized. I'm not seeing a great reason to restrict it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: If this is switched, an equivalent ConfigOwnedTargetPath
doesn't exist yet, but I can quickly make one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems OK to change it to TargetPath
. I chose ConfigOwnedValuePath
since it is also used in Transformer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've tried making a ConfigOwnedTargetPath
.
Unlike ConfigOwnedValuePath
, we need to implement Default
for it. Is it OK to use PathPrefix::Event
as default prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's correct, but @fuchsnj can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfigOwnedTargetPath
shouldn't have Default
either. CsvSerializerOptions
and CsvSerializerConfig
can have Default
removed since they are unused / also not needed.
You're going to need to merge to master to get Check Spelling to run. |
Regression Detector ResultsRun ID: 3616a028-f175-418c-a2b5-b710770f5da4 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Fine details of change detection per experiment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small update, still discussing internally - sorry for the delay!
Regression Detector ResultsRun ID: f2388ba1-5ea9-45da-8105-be3d223b057b ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#16728 was merged into master after CI passed on this PR. This would have broken master if this PR was merged in, so I went ahead and merged master back into this branch and fixed what it broke. That takes care of the comments about Default
I had, so I'm good with this PR now.
Regression Detector ResultsRun ID: 1c4b7c0a-c148-4f7d-b305-8cfb33737cd9 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Fine details of change detection per experiment.
|
fix: #7124