Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wickman's proposal for pants reorg #5

Closed
jsirois opened this issue Jul 30, 2013 · 5 comments
Closed

Wickman's proposal for pants reorg #5

jsirois opened this issue Jul 30, 2013 · 5 comments

Comments

@jsirois
Copy link
Contributor

jsirois commented Jul 30, 2013

This was originally filed here

When pants moves to its own repository, I'd like to take the opportunity to do a code reorg in order to make it easier for other projects / repositories / organizations to use pants.

current layout

Backends (java, scala, python, etc) consist of tasks and targets, and basic wiring into phases. Right now the code org is something like this:

  • pants.bin
  • pants.base
  • pants.commands
  • pants.doc
  • pants.goal
  • pants.java
  • pants.python
  • pants.pants_doc
  • pants.python
  • pants.targets
  • pants.tasks

Bolded is what I consider to be "pants core" – as in pants simply can't operate without code in there. Specifically this is the execution engine and ancillary stuff like base classes for everything else (Task, Goal, Target.) Currently the Task/TaskError base classes are in pants.tasks.init but should be moved into pants.base.

While pants/init.py contains some code necessary for pants to run (get_buildroot), most of it should be moved out to the leaves (is_* e.g. is_java, is_scala) and 'from twitter.pants.targets import *' should go away entirely.

new backend layout

I propose that everything in pants.java, pants.python, pants.targets and pants.tasks be reorganized (temporarily) into a new toplevel directory:

  • pants.backend

With the following layout:

  • pants.backend.{language}.init
  • pants.backend.{language}.base
  • pants.backend.{language}.targets
  • pants.backend.{language}.tasks

Each backend is treated as a namespace package so that they can more easily be developed by third parties. This model works naturally with the python.new backend, and for the most part with the Java and Scala backends.

One case where this is more challenging is IDL backends. My proposal for handling this is by making the CodeGen(Task) base class in pants.base the canonical base class for code generation, and specific pants.backend.thrift and pants.backend.protobuf IDL backends that other backends can then depend upon. For example, pants.backend.python would depend upon pants.backend.thrift, and subclass the ThriftGen base from pants.backend.thrift.base to create a Python-specific codegen target. (I've already done the refactor in the python.new branch to make this more straightforward through language-specific createtarget abstract base classes.) So rather than the 'gen' phase with 'gen:thrift' and 'gen:protobuf' goals, you'd have the 'gen' phase with 'gen:thrift-python', 'gen:thrift-java', 'gen:protobuf-go' etc.

bootstrapping

To bootstrap a repository to use pants, you would do the following:

  1. copy the bootstrap_pants.py script into your repository and name it 'pants', chmod +x
  2. add a root BUILD file which sets up your source root and defines which plugins you'd like

The pants bootstrap script currently bootstraps itself from PyPI, though some of the dependent source dists have not yet been published (specifically twitter.pants, twitter.common-core, twitter.common.python) but we can do that when we're ready – especially since twitter.pants could be 'pb' or 'pantsbuild' or something else. If you don't have external network access, you'd add a field in pants.ini that specifies where the bootstrapper should be looking for artifacts and you'd cache them locally. Similarly, by default the bootstrap script bootstraps the latest version of pants, but you could address specific versions in the pants.ini using the existing Requirement format (e.g. 'pb==0.3.2' or 'pb>0.3,<1').

The bootstrapper script would write pants.pex and on subsequent runs 1) check the pants.pex version to make sure it's compatible with the one defined in pants.ini and if so 2) os.execv.

repository BUILD

The root BUILD would look like:

require('pants.backend.java==0.2.3')
require('pants.backend.python')
require_path('plugins/foursquare')
# [any extra project specific methods or variables defined here]

Each backend would be a PythonLibrary with a corresponding SetupPy artifact, which can have install_requires in order to dictate dependencies (e.g. pystache, boto, pants.backend.thrift, whatever.) The upside being that 1) this is all already implemented and 2) they could write a backend as a standard setup.py project outside of pants.

require_path would simply look for an init.py in that directory and install everything there into the root namespace.

Currently for BUILD files to be usable, we autoinject 'from pants.targets import *' and 'from twitter.common.quantity import Amount, Time' in every single one. Instead, we'd deposit the union of the namespaces created by the buildroot BUILD, and the init.py of each registered backend into all down-tree BUILDs. The backend _init_s would also be responsible for wiring tasks into phases.

@jinfeng
Copy link

jinfeng commented Oct 24, 2014

@jsirois What's our plan here? I'm writing because I'm hitting an issue to bootstrap pants in twitter pants_internal [1].

Seems we have 3 different type of pants related repos:

  1. pantsbuild/pants:

    The originating open source pants source repo. In this repo, its bootstrap script (pants at root) either checks a pants.pex exists or not, and execute it directly if found, if not then build it and execute it; or runs live against source code if PANTS_DEV=1 is specified. When run live or build a pants.pex, lots of external world accesses are needed, like getting virtualenv from pypi, and subsequently installing requirements from pypi during bootstrap phase. I’m not sure if pants.pex exists, and directly invoked, would virtualenv still be prep’ed and external pypi download still take place? Doesn’t seem so but want to confirm.

    In this repo’s pants.ini there is a section [python_repo], but this is used after pants is up and running and when we need a new python_chroot. So the section doesn’t help during bootstrap phase unless we add code there to parse it and tell pip what repos to look into.

  2. Twitter's pants_internal:

    This is Twitter extension to OS pants. In this repo, its bootstrap script clones OS pants from github (or from an internal mirror of it due to strict outside access control), adds some extra python requirements, and then directly invoking the locally cloned pantsbuild/pants bootstrap script. However, this would only work if run from a machine where outside world access is granted even if OS_PANTS_ORIGIN points to an internal mirror, because pantsbuild/pants's bootstrap script will always try to get virtualenv or other requirements from pypi directly. Also it seems in pants_internal, the bootstrap pants script doesn't check the existence of pants.pex at built root or not. So in short, running pants_internal directly inside an access restricted environment even with OS_PANTS_ORIGIN pointing to internal git mirror doesn’t work.

  3. Twitter's other repo like science/birdcage:

    These are the end repos that consume pants, i.e. using pants to build other products. This repo's root bootstrap pants (not a bash script but a python script), is this the one you mentioned earlier as bootstrap_pants.py? I can't find it anywhere pantsbuild/pants. Anyway, this script simply checks local pants.pex; if yes run it, no, look into pants.ini to find pants PEX caching URL and download the PEX locally and then execute it. Again, not sure if/how virtualenv is activated in the PEX context.

My problem came from 2. above. When we want to execute tests for pants_internal repo, meaning we want to run pants goal test against test code we written in that repo, the bootstrap script would try to clone OS pants and then execute, which isn't allowed in our CI environment.

So I tried with using OS_PANTS_ORIGIN to point to our internal mirror of OS pants. Still not working, because the OS pants bootstrap will get virtualenv and other requirements from pypi. Two options, 1) I could've amend OS pants bootstrap scripts somehow to let them be aware of caching locations, but seems a bit more involved so I opted for 2) I simply using a pre-built pants.pex to run goal tests which bypass all these external accesses.

However, this only goes so for running simple tests. When I run a test case derived from pants_run_integration_test.py which shells out and call pants_internal repo root's pants bootstrap script, since it doesn't check for local pants.pex, it again starts to clone OS pants.

So my questions are (sorry for typing so much, it took me a while to unravel these different code paths during bootstrapping):

  1. To solve my immediate problem: I'm thinking about adding check pants.pex existence into pants_intenral pants script. So it's bit similar to OS pants bootstrap and pants consuming repo bootstrap scripts. Does that sound right?
  2. Looks like you've mentioned above, any concrete plan on making OS pants bootstrap be internal cache aware so we can directly execute OS pants inside firewall?
  3. It would be nice to either doc or having code somewhere to clear layout how to setup repos like 2 (a pants plugin/backend development repo) and 3 (a pants consuming end product repo), what are the templatized bootstrap scripts to use, how additional requirements can be added, how caches should be working, etc?

[1] For non current/former twitter pants devs: this pants_internal repo isn't an end product git repo that uses pants to build, but it's a repo inside twitter where we develop pants plugins/backends and wrap the open source pants in (or rather more precisely, insert our plugins/backends into open sources pants).

@ericzundel
Copy link
Member

I think this is complete. Close?

@lahosken
Copy link
Contributor

lahosken commented Feb 4, 2015

We haven't achieved the stuff in the "bootstrapping" section. But I ?think? we have achieved the code-layout parts of working towards that "boostrapping" section.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 5, 2015

We do need a bootsrap script much like gradlew that can bootsrap a virtualenv for pantsbuild.pants to be pip installed into. But we also are still short in the 'repository BUILD' section. Namely we cannot bootstrap 3rdparty deps - including plugins - for you today and we need to.

@stuhood
Copy link
Member

stuhood commented Feb 12, 2016

All issues mentioned in John's most recent comment have been addressed, and the packaging concerns have all been invalidated (if not fixed).

@stuhood stuhood closed this as completed Feb 12, 2016
kwlzn added a commit that referenced this issue Mar 31, 2017
…n test. (#4407)

### Problem

Currently, on Linux the first thin client call to the daemon can deadlock just after the pantsd->fork->pantsd-runner workflow. Connecting to the process with `gdb` reveals a deadlock in the following stack in the `post_fork` `drop` of the `CpuPool`:

```
#0  0x00007f63f04c31bd in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f63f04c0ded in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007f63d3cfa438 in notify_one ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/condvar.rs:52
No locals.
#3  notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/condvar.rs:39
No locals.
#4  notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sync/condvar.rs:208
No locals.
#5  std::thread::{{impl}}::unpark () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/thread/mod.rs:633
No locals.
#6  0x00007f63d3c583d1 in crossbeam::sync::ms_queue::{{impl}}::push<futures_cpupool::Message> (self=<optimized out>, t=...)
    at /home/kwilson/.cache/pants/rust-toolchain/registry/src/jackfan.us.kg-1ecc6299db9ec823/crossbeam-0.2.10/src/sync/ms_queue.rs:178
        guard = <optimized out>
        self = <optimized out>
#7  0x00007f63d3c588ed in futures_cpupool::{{impl}}::drop (self=<optimized out>)
    at /home/kwilson/.cache/pants/rust-toolchain/git/checkouts/futures-rs-a4f11d094efefb0a/f7e6bc8/futures-cpupool/src/lib.rs:236
        self = 0x37547a0
#8  0x00007f63d3be871c in engine::fs::{{impl}}::post_fork (self=0x3754778) at /home/kwilson/dev/pants/src/rust/engine/src/fs.rs:355
        self = 0x3754778
#9  0x00007f63d3be10e4 in engine::context::{{impl}}::post_fork (self=0x37545b0)
    at /home/kwilson/dev/pants/src/rust/engine/src/context.rs:93
        self = 0x37545b0
#10 0x00007f63d3c0de5a in {{closure}} (scheduler=<optimized out>) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:275
        scheduler = 0x3740580
#11 with_scheduler<closure,()> (scheduler_ptr=<optimized out>, f=...) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:584
        scheduler = 0x3740580
        scheduler_ptr = 0x3740580
#12 engine::scheduler_post_fork (scheduler_ptr=0x3740580) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:274
        scheduler_ptr = 0x3740580
#13 0x00007f63d3c1be8c in _cffi_f_scheduler_post_fork (self=<optimized out>, arg0=0x35798f0) at src/cffi/native_engine.c:2234
        _save = 0x34a65a0
        x0 = 0x3740580
        datasize = <optimized out>
#14 0x00007f63f07b5a62 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
```

This presents as a hang in the thin client, because the pailgun socket is left open in the pantsd-runner.

### Solution

Add pre-fork hooks and tear down the `CpuPool` instances prior to forking and rebuilding them.

### Result

Can no longer reproduce the hang.
lenucksi pushed a commit to lenucksi/pants that referenced this issue Apr 25, 2017
…n test. (pantsbuild#4407)

### Problem

Currently, on Linux the first thin client call to the daemon can deadlock just after the pantsd->fork->pantsd-runner workflow. Connecting to the process with `gdb` reveals a deadlock in the following stack in the `post_fork` `drop` of the `CpuPool`:

```
#0  0x00007f63f04c31bd in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
pantsbuild#1  0x00007f63f04c0ded in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
pantsbuild#2  0x00007f63d3cfa438 in notify_one ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/condvar.rs:52
No locals.
pantsbuild#3  notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/condvar.rs:39
No locals.
pantsbuild#4  notify_one () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sync/condvar.rs:208
No locals.
pantsbuild#5  std::thread::{{impl}}::unpark () at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/thread/mod.rs:633
No locals.
pantsbuild#6  0x00007f63d3c583d1 in crossbeam::sync::ms_queue::{{impl}}::push<futures_cpupool::Message> (self=<optimized out>, t=...)
    at /home/kwilson/.cache/pants/rust-toolchain/registry/src/jackfan.us.kg-1ecc6299db9ec823/crossbeam-0.2.10/src/sync/ms_queue.rs:178
        guard = <optimized out>
        self = <optimized out>
pantsbuild#7  0x00007f63d3c588ed in futures_cpupool::{{impl}}::drop (self=<optimized out>)
    at /home/kwilson/.cache/pants/rust-toolchain/git/checkouts/futures-rs-a4f11d094efefb0a/f7e6bc8/futures-cpupool/src/lib.rs:236
        self = 0x37547a0
pantsbuild#8  0x00007f63d3be871c in engine::fs::{{impl}}::post_fork (self=0x3754778) at /home/kwilson/dev/pants/src/rust/engine/src/fs.rs:355
        self = 0x3754778
pantsbuild#9  0x00007f63d3be10e4 in engine::context::{{impl}}::post_fork (self=0x37545b0)
    at /home/kwilson/dev/pants/src/rust/engine/src/context.rs:93
        self = 0x37545b0
pantsbuild#10 0x00007f63d3c0de5a in {{closure}} (scheduler=<optimized out>) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:275
        scheduler = 0x3740580
pantsbuild#11 with_scheduler<closure,()> (scheduler_ptr=<optimized out>, f=...) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:584
        scheduler = 0x3740580
        scheduler_ptr = 0x3740580
pantsbuild#12 engine::scheduler_post_fork (scheduler_ptr=0x3740580) at /home/kwilson/dev/pants/src/rust/engine/src/lib.rs:274
        scheduler_ptr = 0x3740580
pantsbuild#13 0x00007f63d3c1be8c in _cffi_f_scheduler_post_fork (self=<optimized out>, arg0=0x35798f0) at src/cffi/native_engine.c:2234
        _save = 0x34a65a0
        x0 = 0x3740580
        datasize = <optimized out>
pantsbuild#14 0x00007f63f07b5a62 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
```

This presents as a hang in the thin client, because the pailgun socket is left open in the pantsd-runner.

### Solution

Add pre-fork hooks and tear down the `CpuPool` instances prior to forking and rebuilding them.

### Result

Can no longer reproduce the hang.
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 15, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 15, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 15, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 15, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 15, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 16, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 16, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 19, 2019
Eric-Arellano referenced this issue in Eric-Arellano/pants Jul 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants