Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C Implementation #160

Closed
zkat opened this issue Sep 15, 2021 · 14 comments
Closed

C Implementation #160

zkat opened this issue Sep 15, 2021 · 14 comments
Labels
help wanted Extra attention is needed implementations

Comments

@zkat
Copy link
Member

zkat commented Sep 15, 2021

Some people seem to really want specifically a C implementation that can be fast and widely-used. So if you're interested, this is the issue to take on!

Me, I can't do C for shit so don't look at me, I can't do it :P

@zkat zkat added help wanted Extra attention is needed implementations labels Sep 15, 2021
@garrisonhh
Copy link

garrisonhh commented Sep 15, 2021

I'd absolutely love to take this on, the kind of implementation I would want to do by default would be something very similar to my json library. However, it's single-header, loads entire files at once, and uses a memory model which might be unfamiliar to some devs.

Besides conforming to the specification, what kinds of features would you be looking for in a C implementation? Is a streaming parser a better idea? How would you want serialization to look? How much minimalism, performance, and flexibility should be sacrificed for abstraction? Are there json, xml, or yaml libraries for C that you really like the design of?

Obviously I haven't written anything yet and the project would change as it went on of course, but I'd love to hear your thoughts!

@zkat
Copy link
Member Author

zkat commented Sep 15, 2021

I'm not a C developer so I don't really know what would be best. My thoughts on C are that there are two main use cases:

  1. direct C usage by C developers for C applications
  2. Other language wrappers around C

The first seems like it would work really well with what you're proposing here. The latter might be better server by a very high performance streaming parser, so other languages can use C as a perf boost.

I think there's room for both so you can do whichever you find more fun! What do you think?

@garrisonhh
Copy link

I'm more comfortable with the first use case, but writing something focused on performance and language interop sounds absolutely dope! I'll give it a shot and get back to you

@zkat
Copy link
Member Author

zkat commented Sep 16, 2021

@garrisonhh one thing that might be of interest is https://github.com/lucretiel/kaydle, which has a Rust implementation optimized for streaming parsing. It might be too Rust-flavored to work for C, tho?

@swift2plunder
Copy link
Contributor

The basic interface for language wrappers is a stack machine. SWIG bindings and FastFFI bindings are a little more complicated, but I took a quick look at the API for the library you linked and that style would be great

Your JSON library's access and modification APIs could reasonably be copied here, but it would be even better to have an API informed by the query specification

@garrisonhh
Copy link

Cool, I don't really know rust but the structure gives me some ideas. I'm creating a low-level token stream and then going to build higher-level stuff on top of that. I would love to recreate a similar API to the json lib, but it's only really as convenient as it is due to the DOM structure - I'll see what I can do though. Using the query specification also makes a lot of sense. Thanks for the input!

@garrisonhh
Copy link

Hey guys, a couple questions about the spec:

  • Are args/props ordered?
  • Are nodes within a document or child block ordered?
  • Could all types of non-null values be allowed as keys? The syntax is very clear when something is a node id or a prop id vs. a value, so I think allowing my-node 3=4 true=false could be pretty cool

I also think making type annotations mandatory could make some sense, if they can't be relied upon consistently across languages why use them outside of expressing intent?

Outside of that, I'm starting to get somewhere with my implementation, I've arrived at a design that I like, and as far as performance goes it's gonna be pretty goddamn fast. If you are aware of any large-scale KDL files (like 50MB+) that I can use for testing I would really appreciate it! I can always do some python magic if none exist that you know of.

@zkat
Copy link
Member Author

zkat commented Sep 21, 2021

  • args are ordered, props are not.
  • props follow a last-prop-wins rule for duplicate entries.
  • children/document nodes MUST be ordered
  • keywords cannot be identifiers (null, true, false)

@eXeC64
Copy link

eXeC64 commented Sep 27, 2021

Hello all. It seem garrisonhh and I took the up the challenge to write a C parser for KDL at around the same time. However I didn't see or comment in this thread, so didn't see I was duplicating efforts with garrisonhh. That said, I don't think there's any harm in having multiple implementations available. It sounds like we've got slightly different priorities anyway; I've aimed for maximum portability without worrying much about performance, while garrisonhh's solution sounds like it'd perform better.

Mine's more or less "done" and passing a fairly thorough test suite, but is not yet battle hardened. Reviews would be appreciated:

https://git.sr.ht/~exec64/freekdl

One big difference I noticed with my implementation to the official test suite is that the suite allows for type annotations to be quoted strings, but the spec as I understand it (and my implementation) only permit bare identifiers to be type annotations.

@garrisonhh
Copy link

Very cool! The API looks clean at a glance and from what I can see in the test file. I think it's fascinating that we independently used similar terminology like "consume" and "expect". I think it would be awesome for critique if you also provided more examples of usage in the README or as source files!

I'm definitely very zoned in on performance and memory usage so I don't think we're really stepping on each others toes. I'm pretty close to a state I'd feel comfortable sharing, so we can compare notes then!

@eXeC64
Copy link

eXeC64 commented Sep 27, 2021

Thanks. The main thing I'd like to do is enhance test coverage a bit more (specifically on input that should fail to parse), and improve error message output. Right now error messages at least say where in the input it failed, and roughly what went wrong, but it could definitely be more informative. There's also a few code cleanups that could happen, as the design was developed fairly organically beyond knowing that I'd have a distinct tokeniser stage.

The other main thing my implementation is lacking is convenience functions for numbers. I currently only expose them as strings, but should provide a few helper functions to get/set them as double, uint32_t, int32_t, uint64_t, int64_t.

Lastly, I'd like to aim a fuzzer, valgrind, and other tooling at it to find any possible crashes or vulnerabilities.

@tjol
Copy link
Contributor

tjol commented Sep 16, 2022

I thought this sounded like a fun week-end project...

... rather more than a weekend later, I have a library that appears to work (and passes the test suite here), but probably still handles some edge cases incorrectly. In a burst of creative naming, I called it ckdl - yeah, I know.

https://github.com/tjol/ckdl

I went a very different route to @eXeC64 in terms of API design: instead of reading the document into a C structure, ckdl provides a stream of high-level parse events much like SAX does for XML. The idea here is that while this may be inconvenient for C applications, it's great for wrapper libraries in other languages, because you get to load the data into your own structures pretty directly.

Since the idea was to enable wrapper libraries in other languages, I went ahead and coded up minimal wrappers for Python and C++.

@zkat
Copy link
Member Author

zkat commented Sep 16, 2022

@tjol this is really awesome! Please go ahead and PR this to this repo's README as a C implementation (and do the same for https://github.com/kdl-org/kdl-org.github.io/blob/main/src/_includes/partials/implementations.md)!

I'm gonna close this issue for now, since we now actually have a C implementation, even if it'll be nice to have the struct-based one too :)

@zkat zkat closed this as completed Sep 16, 2022
@zkat
Copy link
Member Author

zkat commented Sep 16, 2022

oh and you probably wanna add it to Python and C++ too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed implementations
Projects
None yet
Development

No branches or pull requests

5 participants