C Implementation #160

zkat · 2021-09-15T00:56:42Z

Some people seem to really want specifically a C implementation that can be fast and widely-used. So if you're interested, this is the issue to take on!

Me, I can't do C for shit so don't look at me, I can't do it :P

garrisonhh · 2021-09-15T13:16:09Z

I'd absolutely love to take this on, the kind of implementation I would want to do by default would be something very similar to my json library. However, it's single-header, loads entire files at once, and uses a memory model which might be unfamiliar to some devs.

Besides conforming to the specification, what kinds of features would you be looking for in a C implementation? Is a streaming parser a better idea? How would you want serialization to look? How much minimalism, performance, and flexibility should be sacrificed for abstraction? Are there json, xml, or yaml libraries for C that you really like the design of?

Obviously I haven't written anything yet and the project would change as it went on of course, but I'd love to hear your thoughts!

zkat · 2021-09-15T14:36:37Z

I'm not a C developer so I don't really know what would be best. My thoughts on C are that there are two main use cases:

direct C usage by C developers for C applications
Other language wrappers around C

The first seems like it would work really well with what you're proposing here. The latter might be better server by a very high performance streaming parser, so other languages can use C as a perf boost.

I think there's room for both so you can do whichever you find more fun! What do you think?

garrisonhh · 2021-09-15T15:54:52Z

I'm more comfortable with the first use case, but writing something focused on performance and language interop sounds absolutely dope! I'll give it a shot and get back to you

zkat · 2021-09-16T01:30:08Z

@garrisonhh one thing that might be of interest is https://github.com/lucretiel/kaydle, which has a Rust implementation optimized for streaming parsing. It might be too Rust-flavored to work for C, tho?

swift2plunder · 2021-09-16T04:09:09Z

The basic interface for language wrappers is a stack machine. SWIG bindings and FastFFI bindings are a little more complicated, but I took a quick look at the API for the library you linked and that style would be great

Your JSON library's access and modification APIs could reasonably be copied here, but it would be even better to have an API informed by the query specification

garrisonhh · 2021-09-17T12:49:45Z

Cool, I don't really know rust but the structure gives me some ideas. I'm creating a low-level token stream and then going to build higher-level stuff on top of that. I would love to recreate a similar API to the json lib, but it's only really as convenient as it is due to the DOM structure - I'll see what I can do though. Using the query specification also makes a lot of sense. Thanks for the input!

garrisonhh · 2021-09-21T19:36:52Z

Hey guys, a couple questions about the spec:

Are args/props ordered?
Are nodes within a document or child block ordered?
Could all types of non-null values be allowed as keys? The syntax is very clear when something is a node id or a prop id vs. a value, so I think allowing my-node 3=4 true=false could be pretty cool

I also think making type annotations mandatory could make some sense, if they can't be relied upon consistently across languages why use them outside of expressing intent?

Outside of that, I'm starting to get somewhere with my implementation, I've arrived at a design that I like, and as far as performance goes it's gonna be pretty goddamn fast. If you are aware of any large-scale KDL files (like 50MB+) that I can use for testing I would really appreciate it! I can always do some python magic if none exist that you know of.

zkat · 2021-09-21T19:39:44Z

args are ordered, props are not.
props follow a last-prop-wins rule for duplicate entries.
children/document nodes MUST be ordered
keywords cannot be identifiers (null, true, false)

eXeC64 · 2021-09-27T22:53:48Z

Hello all. It seem garrisonhh and I took the up the challenge to write a C parser for KDL at around the same time. However I didn't see or comment in this thread, so didn't see I was duplicating efforts with garrisonhh. That said, I don't think there's any harm in having multiple implementations available. It sounds like we've got slightly different priorities anyway; I've aimed for maximum portability without worrying much about performance, while garrisonhh's solution sounds like it'd perform better.

Mine's more or less "done" and passing a fairly thorough test suite, but is not yet battle hardened. Reviews would be appreciated:

https://git.sr.ht/~exec64/freekdl

One big difference I noticed with my implementation to the official test suite is that the suite allows for type annotations to be quoted strings, but the spec as I understand it (and my implementation) only permit bare identifiers to be type annotations.

garrisonhh · 2021-09-27T23:24:48Z

Very cool! The API looks clean at a glance and from what I can see in the test file. I think it's fascinating that we independently used similar terminology like "consume" and "expect". I think it would be awesome for critique if you also provided more examples of usage in the README or as source files!

I'm definitely very zoned in on performance and memory usage so I don't think we're really stepping on each others toes. I'm pretty close to a state I'd feel comfortable sharing, so we can compare notes then!

eXeC64 · 2021-09-27T23:31:34Z

Thanks. The main thing I'd like to do is enhance test coverage a bit more (specifically on input that should fail to parse), and improve error message output. Right now error messages at least say where in the input it failed, and roughly what went wrong, but it could definitely be more informative. There's also a few code cleanups that could happen, as the design was developed fairly organically beyond knowing that I'd have a distinct tokeniser stage.

The other main thing my implementation is lacking is convenience functions for numbers. I currently only expose them as strings, but should provide a few helper functions to get/set them as double, uint32_t, int32_t, uint64_t, int64_t.

Lastly, I'd like to aim a fuzzer, valgrind, and other tooling at it to find any possible crashes or vulnerabilities.

tjol · 2022-09-16T22:12:41Z

I thought this sounded like a fun week-end project...

... rather more than a weekend later, I have a library that appears to work (and passes the test suite here), but probably still handles some edge cases incorrectly. In a burst of creative naming, I called it ckdl - yeah, I know.

https://github.com/tjol/ckdl

I went a very different route to @eXeC64 in terms of API design: instead of reading the document into a C structure, ckdl provides a stream of high-level parse events much like SAX does for XML. The idea here is that while this may be inconvenient for C applications, it's great for wrapper libraries in other languages, because you get to load the data into your own structures pretty directly.

Since the idea was to enable wrapper libraries in other languages, I went ahead and coded up minimal wrappers for Python and C++.

zkat · 2022-09-16T22:16:42Z

@tjol this is really awesome! Please go ahead and PR this to this repo's README as a C implementation (and do the same for https://github.com/kdl-org/kdl-org.github.io/blob/main/src/_includes/partials/implementations.md)!

I'm gonna close this issue for now, since we now actually have a C implementation, even if it'll be nice to have the struct-based one too :)

zkat · 2022-09-16T22:21:06Z

oh and you probably wanna add it to Python and C++ too :)

zkat added help wanted Extra attention is needed implementations labels Sep 15, 2021

zkat mentioned this issue Sep 15, 2021

KDL 1.0.0 compliance for implementations #144

Closed

13 tasks

zkat closed this as completed Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C Implementation #160

C Implementation #160

zkat commented Sep 15, 2021

garrisonhh commented Sep 15, 2021 •

edited

Loading

zkat commented Sep 15, 2021

garrisonhh commented Sep 15, 2021

zkat commented Sep 16, 2021

swift2plunder commented Sep 16, 2021

garrisonhh commented Sep 17, 2021

garrisonhh commented Sep 21, 2021

zkat commented Sep 21, 2021

eXeC64 commented Sep 27, 2021

garrisonhh commented Sep 27, 2021

eXeC64 commented Sep 27, 2021

tjol commented Sep 16, 2022

zkat commented Sep 16, 2022

zkat commented Sep 16, 2022

C Implementation #160

C Implementation #160

Comments

zkat commented Sep 15, 2021

garrisonhh commented Sep 15, 2021 • edited Loading

zkat commented Sep 15, 2021

garrisonhh commented Sep 15, 2021

zkat commented Sep 16, 2021

swift2plunder commented Sep 16, 2021

garrisonhh commented Sep 17, 2021

garrisonhh commented Sep 21, 2021

zkat commented Sep 21, 2021

eXeC64 commented Sep 27, 2021

garrisonhh commented Sep 27, 2021

eXeC64 commented Sep 27, 2021

tjol commented Sep 16, 2022

zkat commented Sep 16, 2022

zkat commented Sep 16, 2022

garrisonhh commented Sep 15, 2021 •

edited

Loading