Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mesos: 0.28.2 -> 1.0.1 #19064

Merged
merged 1 commit into from
Nov 23, 2016
Merged

mesos: 0.28.2 -> 1.0.1 #19064

merged 1 commit into from
Nov 23, 2016

Conversation

cstrahan
Copy link
Contributor

Motivation for this change

The mesos package was marked broken due to being an old, insecure version, so let's bump it up to the latest version.

Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • OS X
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

/cc @kamilchm

@cstrahan cstrahan added 2.status: work-in-progress This PR isn't done 8.has: package (update) This PR updates a package to a newer version labels Sep 29, 2016
@cstrahan cstrahan force-pushed the mesos-1.0.1 branch 3 times, most recently from 5dbd36a to 198de26 Compare September 29, 2016 02:16
@cstrahan
Copy link
Contributor Author

One thing I'm curious about:

We use "${bash}/bin/bash" in some places, and stdenv.shell in others. I wonder if we can just use stdenv.shell in all cases...

@cstrahan cstrahan changed the title nixos: 0.28.2 -> 1.0.1 mesos: 0.28.2 -> 1.0.1 Sep 29, 2016
@kamilchm
Copy link
Member

Basic hello world works for me. I want to test it more later with Marathon 1.3 using new containerizer and CNI.
We can merge it now like it is and fix things if we find bugs in more advanced usage.

BTW is there a script that generates maven dependencies list uses as maven repo?

@cstrahan
Copy link
Contributor Author

BTW is there a script that generates maven dependencies list uses as maven repo?

Unfortunately, no. I manually went into src/java and tried building the pom with the maven repo overriden to .maven, which caused all of the pom/jar/sha1 files to be fetched. Then it was just a matter of running find .maven -name '*.pom' -o -name '*.sha1' -o -name '*.jar' and cleaning up the paths, and stuffing that back in the script. A little tedious, I'm afraid.

In the time since I first packaged Mesos, @shlevy wrote https://github.com/NixOS/mvn2nix-maven-plugin, which we might want to take a stab at using instead.

@kevincox
Copy link
Contributor

kevincox commented Oct 2, 2016

I don't know why but all tasks fail to start on this version for me. I get the following message on stderr of the task. I'm using the linux executor to simply run a command.

I1002 15:04:23.098440 31274 exec.cpp:161] Version: 1.0.1
E1002 15:04:23.098924 31284 process.cpp:2105] Failed to shutdown socket with fd 7: Transport endpoint is not connected
I1002 15:04:23.098961 31277 exec.cpp:495] Agent exited ... shutting down

@benley
Copy link
Member

benley commented Oct 9, 2016

@kevincox is it possible that you need to set LIBPROCESS_IP somewhere? That seems to be a common fix for that type of error.

kamilchm referenced this pull request in apache/mesos Oct 10, 2016
@kamilchm
Copy link
Member

kamilchm commented Oct 10, 2016

@kevincox I run it with NixOps and VirtualBox (https://github.com/kamilchm/nix-mesos/blob/master/single-vm.nix) and simple commands works for me after applying this patch apache/mesos@d25cafa
Docker is fine even without it.
I'll be testing universal containerizer now.

@kamilchm
Copy link
Member

There are a lot of things to do here.
New mesos containerizer needs its own dependencies that I'm trying to track now.
I've encountered problems with sandbox permissions when I set marathon user other than root.
Mesos fetcher needs gzip in PATH to unpack uris.
Furthermore, we should to rename mesos-slave service to mesos-agent to follow changes in 1.0.

@cstrahan
Copy link
Contributor Author

@kamilchm

New mesos containerizer needs its own dependencies that I'm trying to track now.

Which dependencies? (unless you mean e.g. gzip, as you point out later on)

I've encountered problems with sandbox permissions when I set marathon user other than root.

I'll have to try that out; any idea what's going on there?

Mesos fetcher needs gzip in PATH to unpack uris.

I tried to patch out all the relative paths to executables -- did I miss some?

Furthermore, we should to rename mesos-slave service to mesos-agent to follow changes in 1.0.

👍

@kamilchm
Copy link
Member

After testing few applications I found only one thing that could be critical for non-docker containers - gzip for mesos-slave.
It can't be patched in mesos source because it comes from the tar itself #13783
I started to do wrapProgram and add gzip PATH to mesos-slave/mesos-fetcher/mesos-containerizer. It solves gzip issue.
But I feel it to be inconsistent to do source patching for some programs and wrapProgram for others :/
What do you think about getting rid of nixos.patch and replacing all the substituteInPlace by wrapProgram to add all the runtime dependencies to PATH?

@cstrahan
Copy link
Contributor Author

What do you think about getting rid of nixos.patch and replacing all the substituteInPlace by wrapProgram to add all the runtime dependencies to PATH?

I know that @edolstra prefers patching paths over wrapping executables, and I tend to agree. An alternative to wrapping the mesos executables would be providing one wrapper for tar (local to this package) so that it resolves gzip.

I don't have strong feelings either way, though. How does that sound, vs wrapping mesos?

@kamilchm
Copy link
Member

Local tar wrapper sounds fair.

@philip-wernersbach
Copy link

You all are doing good work. I'm willing to assist in whatever way is needed. I use the native Mesos containerizer, so I can test that.

@cko cko mentioned this pull request Oct 20, 2016
7 tasks
@kamilchm
Copy link
Member

kamilchm commented Oct 20, 2016

One more problem, when I try to run https://github.com/apache/mesos/blob/master/src/examples/python/test_framework.py I get:

[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: mesos/mesos.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:1018] CHECK failed: generated_database_->Add(encoded_file_descriptor, size): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: generated_database_->Add(encoded_file_descriptor, size): 
Aborted

I'm trying to write test framework for most user cases and run it as nixos test. Here's current state of tests https://gist.github.com/kamilchm/a03a49639eca49f19c4b3e785945630b#file-mesos-nix

@kamilchm
Copy link
Member

@cstrahan do you have any idea how to fix the problem with python bindings?
It looks like standard mesos build links ext python modules less dynamically. _executor.so and _shceduler.so are smaller in nix than when I build it outside nix.
There's easy test for it:

$ nix-shell -p mesos
$ echo -e "from mesos.executor import MesosExecutorDriver\nfrom mesos.scheduler import MesosSchedulerDriver\n" | python

I can't find a way to fix it :/

@kamilchm
Copy link
Member

I prepared few test which works with 0.28 and fails on python bindings with 1.0.1 (#19825).

@philip-wernersbach
Copy link

philip-wernersbach commented Oct 24, 2016

I did just tried to deploy a Redis container on a 3-node Mesos + Marathon cluster, and I get this error:

Failed to launch container: Failed to perform 'curl': curl: (77) error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none ; Container destroyed while provisioning images

Every node is running the latest CentOS 7 release and the latest Nix and Nixpkgs. Looks like somewhere we have a curl impurity, curl should use the CA certificates supplied by Nix.

@philip-wernersbach
Copy link

philip-wernersbach commented Oct 25, 2016

I found two more issues with this:

  1. nixos.patch replaces cp with @cp@ in src/launcher/fetcher.cpp, but the Nix expression doesn't substituteInPlace @cp@ in src/launcher/fetcher.cpp. --subst-var-by cp ${coreutils}/bin/cp needs to be added to the substituteInPlace command for src/launcher/fetcher.cpp.
  2. nixos.patch breaks the Mesos containerizer by replacing os::Shell::name in 3rdparty/stout/include/stout/os/posix/shell.hpp with an absolute path to the Nix package that mesos is built with. The Mesos containerizer executes the shell referenced by os::Shell::name as the container command, so unless the container has the exact same Nix shell package in it, it won't work. I was able to work around this by adding a new variable to os::Shell that is set to sh, and then changing src/slave/containerizer/mesos/launch.cpp to reference this new variable instead of os::Shell::name. This is a quick workaround, but probably not the best way to do it.

@philip-wernersbach
Copy link

It's worth noting that the first issue is probably present in the current mesos expression in Nixpkgs.

@cstrahan
Copy link
Contributor Author

Good catch. Thanks for pointing those out, I'll try to update the PR
tonight, and write some NixOS tests within the next day or two.

On Tue, Oct 25, 2016, at 07:36 PM, Philip Wernersbach wrote:

It's worth noting that the first issue is probably present in the
current mesos expression in Nixpkgs.
— You are receiving this because you were mentioned. Reply to this
email directly, view it on GitHub[1], or mute the thread[2].

Links:

  1. mesos: 0.28.2 -> 1.0.1 #19064 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAIybjtQQo8PUBjMJEmulNjHJej9dQs7ks5q3pJ7gaJpZM4KJgCt

@kamilchm
Copy link
Member

kamilchm commented Oct 26, 2016

Error in python bindings is related to protocolbuffers/protobuf#1489

@cstrahan
Copy link
Contributor Author

Per the comment on protocolbuffers/protobuf#1489, we might want to see if the mesos devs could use protobuf-lite instead.

@kamilchm kamilchm mentioned this pull request Nov 2, 2016
@kamilchm
Copy link
Member

kamilchm commented Nov 2, 2016

As I understand the protobuf issue it could affect many more packages and I don't think we can convince all 3rd party developers to move from protobuf to protobuf-lite.

Anyway, I'm for merging Mesos 1.0.1 like it is here and file a new bug for python bindings. It'll be better than broken Mesos like we have it now.

@copumpkin
Copy link
Member

Yeah, please merge and fix incrementally! 😄

@kamilchm
Copy link
Member

@cstrahan are you working on it?

@kamilchm
Copy link
Member

kamilchm commented Nov 17, 2016

@copumpkin we need someone with merge permissions to move it forward.

I can rewrite python test with new 1.0 HTTP API client https://github.com/douban/pymesos and drop native python bindings in mesos package later.

@ibrahimsag
Copy link
Contributor

any help needed here?

@cstrahan
Copy link
Contributor Author

Sorry gang, life has been rather crazy and I've had to stand back for a bit.

Since everyone seems to be happy with what's here, I'll go ahead and merge.

@cstrahan cstrahan merged commit ecf3098 into NixOS:master Nov 23, 2016
@cstrahan cstrahan mentioned this pull request Dec 25, 2016
7 tasks
@globin globin mentioned this pull request Sep 24, 2017
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.has: package (update) This PR updates a package to a newer version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants