Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPM metadata library #1

Open
dralley opened this issue Mar 13, 2021 · 23 comments
Open

RPM metadata library #1

dralley opened this issue Mar 13, 2021 · 23 comments

Comments

@dralley
Copy link

dralley commented Mar 13, 2021

I just wanted to mention - I just discovered "repomd" a second ago, but I've been working on a similar library on and off for a couple of months.

https://github.com/dralley/rpmrepo_rs/

It's in a similar state of non-completion, but I'm aiming for feature parity with the createrepo_c libraries. Right now I have full serialization and deserialization for repomd.xml and filelists.xml and the others are in-progress, and structurally it needs a lot of cleanup, proper error handling, better testing, etc. Anyways, I already spoke with the author of another similar uncompleted library and we were considering merging the two.

https://github.com/semtexzv/rpmtools

Would you be interested in doing the same? There's not a lot of sense in duplicating effort.

@drahnr
Copy link
Owner

drahnr commented Mar 13, 2021

@dralley thanks for reaching out! I would very much appreciate a combined effort :) could we extend the license to Apache-2.0 and MIT besides MPL?

@dralley
Copy link
Author

dralley commented Mar 13, 2021

Probably, if there is a good reason to? I can't say I have a perfect understanding of all the legal nuances but my perception is that MPLv2 provides all the fancy patent protections and so forth that the Apache 2.0 license provides, plus a few minor copyleft protections, without being a massive headache to work with like the LGPL and similar licenses. Which is why I happen to like it a lot for Rust code.

What are the benefits of MIT + Apache dual license?

@drahnr
Copy link
Owner

drahnr commented Mar 13, 2021

Motivation is mostly it's the defacto rust standard license combo and I have read both, which I did not for the MPL. I don't have a strong opinion about this, as long as the license is compatible with MIT + Apache-2.0 which is the defacto standard for rust projects, so it's mostly a compatibility concern.

@drahnr
Copy link
Owner

drahnr commented Mar 13, 2021

I had a quick peek into your code, but I have to spend some more time and we should talk about your design goals.

@dralley
Copy link
Author

dralley commented Mar 13, 2021

Understood. It's certainly less well known simply due to being a younger license.

In terms of design, it's a bit of a mess at the moment, because I started by making a simple application for downloading RPM repositories. And then I expanded to trying to make a createrepo_c clone, but since XML doesn't fit all that well into the serde model, I could never get it to write the XML properly... so now I'm writing manual XML parsing and writing using quick-xml alone. That's where I'm at currently.

I want to split into multiple crates but haven't yet. Obviously it's not great to have library + application code mixed together like this.

@dralley
Copy link
Author

dralley commented Mar 13, 2021

It looks like the original reasoning for MIT + Apache was that they needed the patent and trademark protections from the Apache license, but the FSF claims that Apache 2.0 isn't GPLv2 compatible. Dual licensing with MIT solves that problem.

https://internals.rust-lang.org/t/rationale-of-apache-dual-licensing/8952

And then in another thread supposedly Graydon Hoare wanted to stick to well-known licenses, and the MPLv2 was only a couple of months old at the time (2012).

@dralley
Copy link
Author

dralley commented Mar 14, 2021

It occurs to me that I should look into the licensing more, anyways.

I've spent a fair number of hours contributing to the createrepo_c project as well (which is the canonical library for manipulating RPM metadata - which is covered by the GPL). My code is completely different in basically every respect, but the GPL FAQ draws a fuzzy border.

https://www.gnu.org/licenses/gpl-faq.html#TranslateCode

It's not exactly clear what "translate" means in a context where the internal structures and patterns are totally different, but still having knowledge of the library works.

@drahnr
Copy link
Owner

drahnr commented Mar 23, 2021

Imho the next steps would be to unify the souce code under one umbrella org and review which parts of which crate are going to make it into a combined repo.

@dralley
Copy link
Author

dralley commented Mar 23, 2021

I'm still trying to get some answers regarding the licensing weirdness. That section of the GPL FAQ paints with a very wide brush and while it doesn't look like the text of the license actually justifies it, I'd rather verify. In the meantime we should probably wait before doing any actual merging.

I invited the author of the other library (@semtexzv) to this thread in case he has any thoughts.

In terms of umbrella org, would you be opposed to asking https://github.com/rpm-software-management/ if we can host the repo there? They maintain createrepo_c, and librpm.rs is already hosted there (although I believe development is paused until the librpm C API is made more threadsafe). I work with them on a semi-regular basis, they might be willing to do so.

@semtexzv
Copy link

Hey, I'm the author of https://github.com/semtexzv/rpmtools . The library I wrote was just an experiment, but I snagged the crates.io names for a future shared crate. I'd gladly point the rpmrepo to a shared crate for reading/writing the metadata. The reading can be done using serde, but the writing probably not, the serde_xml is not in a great state. It'll probably require using one of the SAX (event based) xml library. As for licensing, The Apache + MIT seems to be the best option in rust ecosystem.

TL;DR: Yeah, let's merge the datatype definitions + some standard, sane serializer / deserializer implementation into a library, license it MIT + Apache and ask it to be hosted under https://github.com/rpm-software-management/ once ready.

@dralley
Copy link
Author

dralley commented Mar 24, 2021

The reading can be done using serde, but the writing probably not, the serde_xml is not in a great state. It'll probably require using one of the SAX (event based) xml library.

Yup. That's the route I ended up going down. It's not so bad, really. quick-xml is pretty easy to work with, much moreso than libxml or expat.

I'm also working on a PR upstream to add a higher-level (still manual) API for writing that's even easier & less tedious. tafia/quick-xml#278

@dralley
Copy link
Author

dralley commented Apr 9, 2021

I asked about hosting at https://github.com/rpm-software-management/, and they said it can probably be done, however they generally have a strong preference for LGPL and similar licenses. It might be a harder sell.

I have reading + writing XML working now for primary, filelists, other, and repomd, and the API is slightly better than it was before. None of it is really tested yet though, and error handling is mostly nonexistent.

@dralley
Copy link
Author

dralley commented Jun 18, 2021

FYI, I'm still working on this, just slowly. Work has ramped up a bit so I took a break for a few weeks.

@drahnr
Copy link
Owner

drahnr commented Jun 18, 2021

Likewise, spare time has been very sparse. I plan to get back on this later this summer.

@dralley
Copy link
Author

dralley commented Jul 16, 2021

It's still not quite ready but it's getting close. The main problem is that I ended up needing to make patches to several external libraries though and I have to wait on those to get merged and released, otherwise it can't be built without local clones of those projects.

I've written enough tests to have good confidence that the metadata being generated is correct in the majority of cases and
@semtexzv I split it into multiple crates as you suggested.

@drahnr
Copy link
Owner

drahnr commented Jul 16, 2021

@dralley you can use a [patch.crates-io] section with git overrides until upstream slates a new release.

@dralley
Copy link
Author

dralley commented Aug 7, 2021

I tried this but it seems that crates.io doesn't allow publishing packages with git repository dependencies

@drahnr
Copy link
Owner

drahnr commented Aug 10, 2021

Uh, yeah, I meant for local development to get things moving faster, rather than publishing crates. Eventually one needs to decide if upstream will ingest the required changes or if a workaround can be found or, worst case, a fork is needed.

@drahnr
Copy link
Owner

drahnr commented Aug 29, 2021

@dralley I created a temporary fork of rpm-rs aka rpm-rs-temporary with a bunch extensions and fixes, once/iff upstream picks up again, it'll be dropped again.

@dralley
Copy link
Author

dralley commented Feb 16, 2022

@drahnr I'm probably about 85% done with the metadata writing and parsing aspects, and about 50% done with Python bindings using pyo3.

The main three things that need improvement are tests, error handling, and rpm-rs integration. And waiting on quick-xml to release some patches.

And I guess advisory / errata parsing since that's still nowhere near complete, but it's a little less important. I'm pretty happy with how it looks..

@dralley
Copy link
Author

dralley commented Feb 16, 2022

@semtexzv Do you still have any interest in your rpmtools libraries?

@drahnr
Copy link
Owner

drahnr commented Feb 16, 2022

I do. The question here is more about the next steps since you were already elbow deep in refactoring something last time I checked. Happy to pitch in in a few weeks

@dralley
Copy link
Author

dralley commented Feb 17, 2022

No I meant @semtexzv, since I noticed he moved from Red Hat to Google and I've heard that Google have weird uptight rules around open source projects. I'd like to try to maintain them, or at least some parts of them, like the repo downloading bit.

I don't know if you (Michal) would be willing to transfer the crate name `rpmrepo' to me? I have a Python tool by the same name that uses createrepo_c, but now that my library is getting ready it is finally feasible to use Rust for the whole thing.

And at some point I would still like to move it all over to the rpm-software-management org, but they're a bit busy right now, and that conversation would be easier once the library is 100% complete for at least a subset of the functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants