Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

severe memory leak with piped cat #270

Closed
oconnor663 opened this issue Aug 23, 2015 · 4 comments
Closed

severe memory leak with piped cat #270

oconnor663 opened this issue Aug 23, 2015 · 4 comments

Comments

@oconnor663
Copy link

Run the following in an interactive interpreter:

>>> sh.head(sh.cat('/dev/urandom', _piped=True), '-c', '10')                                       
b'\xc9UJh\x8e\xe7?T\xa2\x02'

Now without closing that interpreter, open top and sort by memory usage. There you'll see the Python process steadily climbing to 100%. Wait long enough and it'll lock up the machine. If you sort by CPU, you'll see that the cat process is still running.

When I try the same with /dev/zero, the call doesn't actually return. top indicates that Python is taking 100% CPU in addition to the same memory leak as above.

@amoffat
Copy link
Owner

amoffat commented Aug 23, 2015

A process internally stores all of the data the flows through it. Otherwise you would never be able to see the output of a command, since these aren't real function calls with return values, they are evaluated objects.

There a lesser known option to _piped that connects two processes directly via their file descriptors. Shame on me for not merging it into the docs yet, but it should prevent your memory from growing: https://github.com/amoffat/sh/pull/243/files

@oconnor663
Copy link
Author

Cool. What's the case where I'd want _piped=True but not _piped="direct"?

@amoffat
Copy link
Owner

amoffat commented Aug 23, 2015

There may not be one. _piped=True came first when the piping mechanism was introduced, and _piped="direct" came years later. It kind of has the benefit of retaining the data that flowed through it, for later processing, but I've personally never taken advantage of it. It seems that more and more, people just want to pipe quickly, without memory overhead, so it may be that _piped=True should work like _piped="direct" under the hood.

@oconnor663
Copy link
Author

Even if we keep the copying, I hope there would be a way to do something like what SIGPIPE does and stop the copying subprocess once there's nothing left for it to copy to.

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants