-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Parser Utility for Typescript #1334
Comments
I @Muthuveerappanv, thank you so much for taking the time to write this RFC. Parser utility is definitely something we want to look into as it would help us advance towards our goal of feature parity with Powertools for Python, so this RFC is more than welcome. I have to admit that I am not very familiar with Zod as a library aside from having read about it in the past. I've read good things about it, especially when it comes to TypeScript support, so at least on the surface it seems like a sensible suggestion. Before committing to it I would like to have more info about it both from the technical standpoint but also in terms of project health and adoption. On the technical side, I'd like to understand:
On the project/governance side:
In terms of the content of the RFC, I also have a couple of followup questions:
Thank you again for the RFC, looking forward to flash it out! |
technical side
Zod uses a schema first approach, making it strictly typed, with all of the internal lambda triggers models and helps a great deal with custom models too, making it easiser (difficult) for developers to stick with a fully typed implementation of lambda based services. also takes care of validation, transformation in one-library.
Zod is very light-weight, so from a size prespective its unzipped
Please refer Requirements section
will not add much value in javascript only codebases project/governance side
its a very well documented and maintained project. Proper release cycles and clear release notes and bug fixes
yes, there are multiple maintainers as well
The release history should give you a good overview - https://github.com/colinhacks/zod/releases terms of the content of the RFC
It was just print the json schema after parsing, its not part of the RFC, will remove
Added in the main comment - #1334 (comment)
I was thinking we will work on the different |
Thank you for clarifying the points and updating the RFC by addressing my points, Muthu. I appreciate it. I guess for me the next step would be to familiarize myself with Zod and dive a bit deeper into how it works. In the meanwhile I'd like to encourage other readers to read the RFC and weigh in. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Is it intentional that this library has two different RFCs for Validation (#508) and Parsing (this issue)? They seem very similar to me. I found this resource which defines the differences between validation and parsing but it seems to me like they should be one feature for this library. Appreciate the clarification in advance and am curious of other perspectives on this. |
Hi @bestickley, yes this is intentional. It's true that there's an overlap between parsing and validating, however we have two RFCs and we intend to offer two separate utilities because we expect different types of customers (or workloads) to lean into one or the other. Based on the experience of Powertools for AWS (Python), we have seen that there's a good amount of customers who have invested in developing JSON schemas or simply are used to work with those. These customers, and by extension, workloads that are migrating to Lambda without major rearchitecting, might want to reach for a Validation utility that is able to process the schemas they already have. On the other hand, newer or greenfield workloads, might want to go directly with a Parser utility and get both validation and parsing in one utility. Parsing however doesn't just provide a two-in-one experience, but also allows a degree of expressivity that the JSON schema spec simply doesn't support, as well as allowing transformation and advanced type-casting. Additionally, depending on the validation and parsing modules that we end up using there's a chance that choosing between the two utility will involve some level of performance tradeoff. At this stage it's too early to speak of this, but that I wouldn't be surprised if it becomes another deciding factor Ultimately, one of our tenets is to allow for progressive adoption & enhancement. Offering two separate utilities in this context allows us to serve customers at different stages of their Serverless journey. |
Hey all, quick update on the RFC. The proposal looks good and we will start breaking it into tasks and move forward with implementation. The key features are:
|
I have started scoping the work and listing the issues/tasks to create and I have a few points/questions that I would like to discuss. None of these is a blocker against a Parser utility based on Zod, however I think it'll be useful down the line to have these points recorded in the RFC. The points are not in any specific order: 1. Models vs Envelopes Python Parser has Models that you can extend, at the same time it has the concept of envelopes which at least on the surface seems to have some overlap. In both sections of the docs they show similar events (EventBridge) and both appear to be two ways of defining the model/schema of the internal What’s the actual difference? The proposal in the RFC seems to conflate the two entities in one. Is this a result/construct of how Pydantic works or is it something that is functionally different? If so, how does this translate to Zod? And how does one brings their own envelope like in Python Parser? 2. Model vs Schema naming Python uses the Should we use model and align with Python, or instead use schema to align with the Zod ecosystem? Our tenet of “They follow language idioms and their community’s common practices.” would suggest the latter but I think it’s important to agree on this since the beginning. 3. Re-exporting Zod Python Parser re-exports Pydantic so that customers can do from Given that we are planning on including Zod as dependency there’s an argument to be made in favor of following a similar strategy. At the same time, as far as I know this is not a common practice in the JS/TS ecosystem and I cannot think of any benefit of doing so, while there’s a non-zero chance that doing so will have impact on bundling and tree-shaking. Thoughts? 4. Data model validation Pydantic has a notion of validation (link) which is also explicitly called out in the Powertools Parser docs and that seems to be treated as a separate feature from the actual parsing. From what I can see this is a choice made by Pydantic rather than Powertools Parser (from here):
Does Zod make a similar distinction? If so, what’s the equivalent of this in Zod? Does it make sense to have this distinction (in Pydantic this is done via class decorators which is for sure not compatible with how Zod works)? 5. Naming & Implementation The proposal in the RFC uses Likewise, the RFC seems to suggest implementing this as a class. I would consider instead using an architecture similar to the one we used in the Idempotency utility, in which we have a parse function that has the bulk of the logic and then separately expose a Middy middleware, a decorator, etc. 6. Function wrapper Generally speaking we try to have our utilities cover three types of usages: 1/ class method decorators, 2/ middy middleware, 3/ manual usage (aka classic function-based usage). For this specific utility, which is intended to primarily target parsing the Based on a first assessment of the implementation it looks like we are going to have a With this in mind, does it really add value to have a function wrapper versus just having customers call that at the top of their function, i.e. export const handler = (rawEvent: unknown) => {
const event = ZodSchema.parse(rawEvent);
// ... rest of the code
} As it stands decorator, middleware, and wrapper function are just half (or less) of the value of the Parser utility, and a lot of the value is in the schemas/models that we offer. If this is true then having a wrapper function doesn't really add much. If instead we make these APIs enhance Zod's experience with things like: 1/ handling JSON strings (which Zod doesn't do natively/in a straightforward way), 2/ enhancing error handling/extraction (which can be boilerplate-y in Zod), and other things, then having a function wrapper does make sense. 7. Testing strategy For other utilities we have strived to reach unit test 100% coverage and have integration tests in each utility. Given that this utility relies on a certain set of inputs (the schemas and maybe envelopes) and that there isn't any AWS API interaction we should discuss the testing strategy. For unit tests, I think that unless Zod makes this impossible, we should continue having 100% test coverage for our code. This however implies that we have examples for AWS events that we want to support. How do we plan to acquire these events? And do we want to make any effort to programmatically keep them up to date? For integration tests, does it make any sense at all to have integration tests for this utility? For Batch Processing, which works similarly, we have opted for not having them so there's an argument in favor of doing the same here. Having integration tests in which we simply load the utility in a Lambda and we send artificial events as part of the test wouldn't test/prove any additional behavior that we are already not covering with the unit tests. At the same time, deploying all the kind of resources needed to generate real events (and their failure modes) would require a significant effort which I'm not sure it's justified by the value add. Thoughts? |
Great points to foster the direction of this feature and resolve additional unknowns.
I agree that this is somehow confusing. The built-in models are necessary so we can extend them with custom models of the payload. They also bring additional functionality to parse the payload based on the event source, i.e. SQS message inside kinesis event. For instance, the As for the envelopes, there is a strong argument that in many situations, we are only interested in the payload of the event. But it takes several steps to get there. 1/ Understand the event structure, /2 get the right field (was it
As you mentioned, following the tenet, we should stick 100% to the language domain of our ecosystem and keep it consistent. The trade-off I see is that it'd be more difficult for developers who build applications in python and typescript AND use same powertools features.
We had similar discussions on the SDK re-export for parameters and decided to not re-export. I agree with your argument to not include it.
Yes, having a thing layer with core logic in a base function is the best approach, we had similar learnings from Idempotency.
I think this is the direction we can aim for. But I don't have any specifics on configuration or the context we can pass to a wrapper to provide more functionality yet. I'd suggest to focus on decorator and middy first.
We can keep the event structure similar to
I agree it'd be too much overhead to have all the required services to send real events. For this case we need to collect real examples, and it's ok to start with few. It will grow over time when we find more edge cases. |
Hey Alex, thanks for the exhaustive answers, I agree on all points. I think we can start diving deeper into the implementation details of the utility. I have changed the status of the issue to Early next week I'll open a first set of issues to start tracking the work. After that, and once we get the next release (last of v1.x) out, you can start working on this. |
Hi all, zod is a CPU-heavy library which can be a performance bottleneck. They're tracking some improvement tickets but I haven't seen much of a progress. I suggest you explore alternative lightweight libs such myzod (that's my choice for lambdas, otherwise I use zod in non-lambda code). |
Hi @byF, any chance that you could point to some benchmarks ran on one of the current managed Lambda runtimes? We'd like to take a look so that we can better understand the impact. |
@dreamorosi sorry, I don't have any particular Lambda runtime-based benchmarks at my disposal. There is a general benchmark available: https://moltar.github.io/typescript-runtime-type-benchmarks/. I can also point you to a general zod issue regarding perf colinhacks/zod#205. Anecdotally, I saw a noticeable jump in CPU usage and bundle size after switching to zod. |
Hey @byF , thanks for raising this point. I have looked into the issue you have mentioned and also the benchmarks. There are a lot of validation libraries with various performance benchmarks. For the parser utility we needed to decide on one based on a combination of open source health, security, popularity, feature set, and adoption rate. We think zod fits the criteria, but we might be wrong. With a many different choices there will always be a situation where a more performant library pops up on the radar and people want other libraries to support it (i.e. valibot). This is not an attempt to defend zod. I think there are other great project like valibot or typia that we might support in the future. In an ideal world we would support most of the popular validation libraries, where you could bring your own parser and schema. In next step I will run performance tests (#1955) to understand the impact of the parser utility, so we can be transparent and add this information to our documentation. |
For those who are looking for performance, I suggest typia, the library is really fast because they basically generate the most optimized code at build time instead of generate/parse during runtime. You can use the types defined on You don't even need to ship the |
We just launched the first beta version of the utility based on Zod. It's available starting from version v2.1.0 and we are looking at gathering feedback over the next few weeks to correct any issue and remove any sharp edges. We encourage you to give it a try and provide feedback. |
This issue is now closed. Please be mindful that future comments are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so. |
Is this related to an existing feature request or issue?
No response
Which AWS Lambda Powertools utility does this relate to?
Other
Summary
Parser Utility for Typescript
Powertools for python has a parser utility that uses
pydantic
as the underlying library. There is a similar need on the Typescript side.Zod will be a great fit for the parser utility in typescript. It has a lot of similarities with pydantic and would be great fit for Powertools.
Use case
Parsers for Powertools Typescript
Data model parsing is one of the widely used utilities while building services / lambdas. When it comes to Typescript, there are very few libraries that does this job really well. Zod is definitely at the top of this list.
Proposal
Parser Utility
Built-In Zod Schema
Sample EventbridgeSchema (Zod)
Eventbridge Custom
detail
implementation with Zod modelnpm install zod
Lambda Handler -
parser
Decorator functionLambda Handler -
parser
middy midddlewareOut of scope
Potential challenges
generics
andtransforms
in Zod as it would be much needed for the final cut implementationDependencies and Integrations
No response
Alternative solutions
No response
Acknowledgment
The text was updated successfully, but these errors were encountered: