Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPL dependencies #9898

Closed
ryw opened this issue Jul 20, 2020 · 19 comments
Closed

GPL dependencies #9898

ryw opened this issue Jul 20, 2020 · 19 comments

Comments

@ryw
Copy link
Member

ryw commented Jul 20, 2020

We need to avoid GPL.

Apache 2 software can therefore be included in GPLv3 projects, because the GPLv3 license accepts our software into GPLv3 works. However, GPLv3 software cannot be included in Apache projects. The licenses are incompatible in one direction only, and it is a result of ASF’s licensing philosophy and the GPLv3 authors’ interpretation of copyright law.

Snyk is reporting the following dependencies in our requirements.txt for python 3.6, 3.7, 3.8 are GPL v3:

  • jaydebeapi v1.2.3
  • mysql-connector-python v8.0.18
  • pysmbclient v0.1.5
  • unidecode v1.1.1
  • yamllint v1.23.0

I see some previous discussion + mitigation for unidecode but not all of these.

@ryw ryw added kind:feature Feature Requests area:licensing and removed kind:feature Feature Requests labels Jul 20, 2020
@ryw ryw changed the title GPL requirements GPL dependencies Jul 20, 2020
@apache apache deleted a comment from boring-cyborg bot Jul 20, 2020
@turbaszek
Copy link
Member

Should we rise issues in those projects?

@ryw
Copy link
Member Author

ryw commented Jul 20, 2020

I doubt the projects would be open to changing their licenses, but maybe worth opening an issue to ask.

We likely need to look for ways to factor them out of the project.

  • If the requirements are needed for core functionality, we'd need to find or create a replacement.
  • If the requirements power some ancillary functionality, we could ask users to include those libraries in their own requirements.txt, thus keeping Airflow itself clean from GPL.

@potiuk
Copy link
Member

potiuk commented Jul 20, 2020

Sure we should review those. I don't think there is anything to raise in those projects ... if they are using GPL licence, that's their choice.

And it's not all black@white use/no use. But luckily we are perfectly covered and ASF tells us exactly what to do.
The restriction of GPL which belongs to so called "category X" is very precisely described here: https://www.apache.org/legal/resolved.html#category-x. And it's quite clear that this is perfectly OK to have requirements (in form of dependencies) as long as a) we do not redistribute the code or binary and b) this is an optional feature of our software. More details follow:

  1. We cannot distribute the dependency in either form (source or binary). But we can use it (otherwise we would not be able to use Linux as it's Kernel is GPL). Specific comment in Apache licensing policy is "For example, using a GPL'ed tool during the build is OK, however including GPL'ed source code is not."

  2. THEY MAY BE RELIED UPON WHEN THEY SUPPORT AN OPTIONAL FEATURE¶
    Optional means that the component is not required for standard use of the product or for the product to achieve a desirable level of quality. The question to ask yourself in this situation is:
    "Will the majority of users want to use my product without adding the optional components?"

In light of the above:

  1. Yamllint is fine - we are using it as a build tool but we do not redistribute it nor it is needed for Airflow to run (at all)

  2. mysql-connector-python v8.0.18 - that's an interesting one. We have also mysqlclient (also GPL) to connect for MySQL operator. But we do not rely on either to connect to our MetaData store even if MySQL is used as the backend. This entirely depends on the configuration of SQL Alchemy connection string. There are many engines you can use for MySQL and there is for example https://github.com/PyMySQL/PyMySQL which is MIT licence.

  3. Pysmbclient is clearly optional.

  4. unidecode. We have an explanation in the Changelog that this is an optional feature. It is a transitive (and optional) dependency of nvd3 (which we used to have vendored in and modified to not load it). So nvd3/slugify now will only use unidecode if it is installed in the system and it is not necessary for it to run.

### SLUGIFY_USES_TEXT_UNIDECODE or AIRFLOW_GPL_UNIDECODE no longer required

It is no longer required to set one of the environment variables to avoid
a GPL dependency. Airflow will now always use text-unidecode if unidecode
was not installed before.
  1. Jaydbapi is used by the JDBC hook. Also optional.

I think we are good.

@ryw
Copy link
Member Author

ryw commented Jul 20, 2020

Thanks @potiuk - i'll submit a documentation PR around this.

@ryw ryw self-assigned this Jul 20, 2020
@potiuk
Copy link
Member

potiuk commented Jul 20, 2020

Great idea @ryw ! I think by just discussing and documenting/reviewing it we might find some issues we were not aware of. For example I just double checked and I think we should not include "mysql" extra in the binary Docker image of ours.

I just re-read the statement "Apache projects may not distribute Category X licensed components, be it in source or binary form; and be it in ASF source code or convenience binaries". The binary Docker image is a "convenience binary" and currently it does contain "mysql" extra as default -> thus the GPL libraries. So we should likely release a new image very soon (and possibly deprecate the old images).

I wonder what others think?

We can re-release the images again taking released "pypi" package and latest Dockerfiles. That is possible.

WDYT?

@kaxil
Copy link
Member

kaxil commented Jul 20, 2020

We will have to review all the docker dependencies too in that case

https://github.com/apache/airflow/blob/master/Dockerfile#L94-L125

We might be good but we should probably start a thread with ASF to check how they treat docker type convenience binaries

@mik-laj
Copy link
Member

mik-laj commented Jul 25, 2020

Unidecode has alternatives that use a different license model.
https://pypi.org/project/text-unidecode/
I think it shouldn't be a problem to use it instead of the original one. We will not lose any functionality this way, but we will have to contribute to a few projects.
A similar solution exists in jsonschema.
https://github.com/Julian/jsonschema/blob/9d5edb4749ab1f6194aa5c7c099c6e6fd402c4cf/jsonschema/_format.py#L305-L324

@potiuk
Copy link
Member

potiuk commented Jul 26, 2020

@mik-laj : Unidecode is a transitive dependency of nvd3 -> slugify, so short of vendoring those in (which we did in the past) we cannot do much about it. But this one of the reasons why we do not have to do it because it actually uses text-unidecode which supports Artistic Licence. While the licence has been criticised in the past for being too vague (https://en.wikipedia.org/wiki/Artistic_License) unlike GPL, IMHO we should not have any problems with it.

Here relevant fragment of pipdeptree:

- python-nvd3 [required: ~=0.15.0, installed: 0.15.0]
    - Jinja2 [required: >=2.8, installed: 2.11.2]
      - MarkupSafe [required: >=0.23, installed: 1.1.1]
    - python-slugify [required: >=1.2.5, installed: 4.0.1]
      - text-unidecode [required: >=1.3, installed: 1.3]

@potiuk
Copy link
Member

potiuk commented Jul 26, 2020

@kaxil -> yep I am looking at it now and send an email to discuss at the devlist

@potiuk
Copy link
Member

potiuk commented Jul 26, 2020

Hey @kaxil. I did a quick review of what we have in the image, and I believe we would not be able to release ANY image which would not contain GPL code :). Starting from Glibc which is LGPL and including those basic tools and libs:

apt
freetds
gnupg
gosu
libffi6
libsasl (GPL + BSD-4) also X licence
netcat
rsync
unixodbc

So I really doubt we shoudl worry about binaries included in the Docker image :). I will open an issue with Infra and try to find the right person to talk to at ASF but I think it's not an issue at all.

@kaxil
Copy link
Member

kaxil commented Jul 26, 2020

Hey @kaxil. I did a quick review of what we have in the image, and I believe we would not be able to release ANY image which would not contain GPL code :). Starting from Glibc which is LGPL and including those basic tools and libs:

apt
freetds
gnupg
gosu
libffi6
libsasl (GPL + BSD-4) also X licence
netcat
rsync
unixodbc

So I really doubt we shoudl worry about binaries included in the Docker image :). I will open an issue with Infra and try to find the right person to talk to at ASF but I think it's not an issue at all.

Yeah, I agree but also would like to understand more about how ASF treats Docker Images + Helm Chart in general in terms of licensing and releases. And if licensing is not the issue are we ok with using 3rd party docker images. Same for Helm Chart. Can you please send a link or just mention me where you open ticket, I am very much interested to learn more about this too :-)

@potiuk
Copy link
Member

potiuk commented Jul 26, 2020

I found very similar ticket in ASF legal which is without clear answer for > year now. I commented on it and hope to get some answers and discussions - please add your concerns there as well - I think we should really get some clarification on this. (https://issues.apache.org/jira/browse/LEGAL-437?focusedCommentId=17165258&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17165258)

@kaxil -> I do not think this particular case is about the Helm Chart dependencies. This is quite a bit different story. For me, it's quite clear the images we have now (specifically Astronomer's images) have no GPL problem we discuss above . I am not worried about licence in this case - at least verbally we got confirmation that we can use it. As @mik-laj mentioned - without an official confirmation from Astronomer we simply cannot use those images because we cannot provide our users with capability of building the same binaries from the source code that we have no idea what licence covers them. But I am sure this could be easily solved by Astronomer explicitly adding licencing information in https://github.com/astronomer/ap-vendor/tree/master/statsd-exporter for example (it is currently missing). And there is no doubt it can be solved (and it must be solved this way or the other before we release charts).

I think the case with chart images is more about making sure that we can deliver our customers the source code that they can rely on when they want to rebuild all the binary images. It took me basically a weekend to find out how to build and rebuild all the images just from the official images in DockerHub + sources. Result for this is here: #9650. The whole problem I raised is I think the ability to easily rebuild all the binaries for our users. Istrongly feel this is an important property of the ASF software - Binary images are just "convenience" packaging. I am specifically referring to this chapter from http://www.apache.org/legal/release-policy.html#what:

The Apache Software Foundation produces open source software. All releases are in the form of the source materials needed to make changes to the software being released. In some cases, binary/bytecode packages are also produced as a convenience to users that might not have the appropriate tools to build a compiled version of the source. In all such cases, the binary/bytecode package must have the same version number as the source release and may only add binary/bytecode files that are the result of compiling that version of the source code release.

So lack of licencing in Helm chart has to be solved regardless (because we will not be able to release sources) but I think bringing the sources to "apache-airflow" controlled repos where it is not clear/easy/obvious how to build them and making it "easy" is what I am looking for in the Helm Chart dependency discussion.

@bolkedebruin
Copy link
Contributor

Note: do we rely on any of those out of the box without an alternative? Because that's what matters. The default configuration of Airflow should not depend on (L)GPL artifacts and Airflow shouldn't distribute them.

We do not consider docker images to be releases of Airflow and as such they dont go through the same release process (we dont vote on them).

So what is the real issue here?

@kaxil
Copy link
Member

kaxil commented Jul 26, 2020

Thanks for mentioning me :) - Hopefully we will get some clarity over this issue.

re: Helm Chart license - I wasn't specifically actually talking about Astronomer images but just wanted to get some clarity on how we treat Docker images + Helm chart i.e should they go through release process + all licensing :)

@kaxil
Copy link
Member

kaxil commented Jul 26, 2020

Note: do we rely on any of those out of the box without an alternative? Because that's what matters. The default configuration of Airflow should not depend on (L)GPL artifacts and Airflow shouldn't distribute them.

We do not consider docker images to be releases of Airflow and as such they dont go through the same release process (we dont vote on them).

So what is the real issue here?

Thanks @bolkedebruin for commenting. There are 2-3 separate issues we are talking about here.

  1. The dependencies that were reported by Synk that Ry has listed in the PR description:

    • jaydebeapi v1.2.3
    • mysql-connector-python v8.0.18
    • pysmbclient v0.1.5
    • unidecode v1.1.1
    • yamllint v1.23.0

    However, all these dependencies are extras and have to be installed with apache-airflow[mysql] for example, so that might not be an issue

  2. Whether we need to take care of licenses of Docker Image + Helm Chart dependencies. Or whether we just treat them as convenience packages as currently, we are neither voting on them.

  3. The capability of building the same binaries from the source code that we have no idea what licence covers them (talking about Docker images used in Helm Chart). Should we bring the sources under "apache/airflow" umbrella.

@potiuk
Copy link
Member

potiuk commented Jul 26, 2020

@bolkedebruin -> there are three distinct problems here I think (started to write it in parallel to Kaxil but I think we have very close view on the scope of the questions we have). I renumbered my points to have the same numbers as Kaxil, and add my view/thinking for them.

  1. Question whether we depend on X licence (GPL basically) in order to get Airflow up and running, IMHO (similar to Kaxil) the answer is "no" - those are all optional libraries and not essential (extras or build tools).

  2. Question whether introducing several of those GPL binaries in our Docker image is "ok" in terms of releasing "convenience binaries" from Apache point of view. IMHO - this is also OK (same way as we are relying on glibc for example to run basically anything in the image)

  3. Question whether the users are able to use our sources released formally by us to build the software without depending on 3rd party "unofficially" released binaries/images ("official" is what I define by https://docs.docker.com/docker-hub/official_images/ ), I think this is not a licencing issue, this is more of a "user" convenience issue - how easy it is for our users to use sources we release to rebuild the software we release. I think we are good for that for the Dockerfile (source) -> it's easy to build docker image by the users.
    But it's not (currently) easy for the Helm Chart as it relies on a few external images that are not easy/clear to know how to build and we are not releasing sources to build them. IMHO - we should bring the sources in to rebuild those in and release the sources officially - together with the Helm Chart.

@bolkedebruin
Copy link
Contributor

@potiuk i agree on all accounts. As long as we don't stamp the docker images "official" we are not in a gray area.

Also the dependencies mentioned for the docker images are either transient (apt, you can remove apt from the image) or optional (freetds).

@kaxil
Copy link
Member

kaxil commented Jul 26, 2020

@bolkedebruin We are actually marking the Docker Images as "official Images" (in docs and different presentations) and hence we'd like some more clarity around that. (But they are official convenience packages - so I'd agree about GPL binaries being "ok" )

@kaxil
Copy link
Member

kaxil commented Mar 30, 2021

Closing this for now -- Since these are not in default requirements. We can reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants