Add `exec` onramp/offramp #20

Licenser · 2020-04-16T10:35:39Z

No description provided.

anupdhml · 2020-04-17T18:21:16Z

text/0009-onramp-offramp-exec.md

+does not make sense.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation


as discussed in the meeting earlier today: let's define how stdout and stderr output from the exec commands are treated by the onramp. Could be useful to forward them as distinct outputs from onramp, such that users can connect them differently for their pipeline needs.

Also, would be good to set the command executed as part of the event's origin uri. Users can access that metadata from tremor-script and use it as needed: https://docs.tremor.rs/tremor-script/functions/origin/.

marioortizmanero · 2021-03-30T12:49:56Z

I've come up with a few possible ways to run a given command from the config for the offramp implementation.

The current RFC doesn't specify if stuff like pipes and redirections are going to be supported, so I'll try to take them into account and see if it's possible. The solutions are actually more complicated because it depends on the Operating System it's running on, but I'll talk about how it would work on Unix systems:

By running the command with sh -c "<command>" and later replacing {} with the event. For example, sh -c "curl -d status=\"{}\" api.twitter.com/status". This means that tremor would have to escape certain sequences like ", `, $ or '. But it's far from trivial because these escape sequences depend on what's wrapping the {}:
- If you're using "{}", you wouldn't have to escape '.
- If you're using '{}', you wouldn't have to escape ".
- If you're using an unquoted {}, you have to escape lots of other characters, like the space itself, |, >...
Writing the data into a file or pipe and forcing the user to use that instead of {}. For example you'd need cat /tmp/tremor instead of echo "{}". But this may introduce some overhead and I'm not sure how it could work for concurrent streams.
By splitting the string by whitespace and using std::process::Command step by step to run the command. This makes things like pipes and redirections complicated because at this point we're reimplementing the shell. But the good part is that we don't really need to escape anything.

I've been taking a look at how others do this:

https://github.com/uutils/findutils doesn't even escape the -exec echo \{\} \;-type commands because it uses std::process::Command when splitting them up.
https://github.com/google/rust-shell is a library that does literally what we want to do. It's a very hand-crafted solution with more complexity than I expected, as it spans a few files and implements a custom parser. Unfortunately, it only works for Unix-like systems and it's been abandoned for 3 years. I've also seen some bugs and missing features, like ~ not being expanded to $HOME. Not to mention that pipes and redirects cause a panic. This one seems to run the command split up with std::process::Command

In conclusion, if we want to support stuff like pipes the only viable way to do it is with sh -c. But even that is really complicated to implement, so we should only support simple commands. And in that case the best way is to split up the command by whitespace and use std::process::Command. We should assume that if the user wants more features like environment variables, redirects or pipes, they can just create a script and run that instead. But our command execution logic should be as simple as possible because otherwise it's a mess and not portable at all.

marioortizmanero · 2021-03-30T12:57:01Z

I'm bringing this issue up because I would like to try implementing it to warm up for the GSoC. After trying to find out how the command could be ran, another thing to take into account is whether we should use OsStr/OsString or str/String. The incoming data is in Vec<u8> after running the post-processing, so we'd have to parse it into some kind of string std::process::Command can handle.

The more correct way would be using OsStr/OsString -- they exist for a reason -- but that means operations like replace aren't supported and it could be more complicated.

Can we assume all commands input by the user will parse correctly with str/String?

ernadh · 2021-03-30T13:44:33Z

Take a look at this PR for an implementation of an exec onramp that utilizes rust-subprocess crate. The same approach was meant to be used for the offramp. You'll find there some answers to the questions related to redirection and whether data needs to be stringified.

As far as escaping input goes, I believe the Twitter example was just sending empty status for each event rather than embedding an event in the actual command. A useful implementation that carries the similar spirit can be found in Telegraf's continuous exec output and example configuration here. The non-continuous version can be found here, though I believe we didn't mean to differentiate between the two logically but via configuration in Tremor.

marioortizmanero · 2021-03-30T19:35:04Z

Take a look at this PR for an implementation of an exec onramp that utilizes rust-subprocess crate. The same approach was meant to be used for the offramp. You'll find there some answers to the questions related to redirection and whether data needs to be stringified.

Huh I like how you think. So you just create a shell script file with the command and execute that with whatever shell is configured. Though I don't see that much of a necessity for rust-subprocess here because the offramp is simpler in the sense that you don't have to poll the command. You just execute it and wait for its termination, which AFAIK can be done with std::subprocess. Anyhow, I could use it for consistency with the onramp.

In that PR the UTF-8 part isn't really discussed so I would still like to get some opinions on that.

As far as escaping input goes, I believe the Twitter example was just sending empty status for each event rather than embedding an event in the actual command.

Oh I completely misunderstood it then. Using stdin sounds like a better idea to pass the event; I thought the proposal meant that {} would be replaced.

ernadh · 2021-03-30T20:03:07Z

Data should really be an octet stream on which codec and processors can be applied. That PR is unfinished but the latest iteration we were working on is not forcing UTF-8 data.

rust-subprocess was used due to different types of execution like continuous or periodical for which std::process was not sufficient. There's a case to be made for an offramp that's expensive to start on every event that one should be able to run continuously but that's likely not rev0 of this offramp. cc: @tremor-rs/tremor-core

marioortizmanero · 2021-03-31T14:47:55Z

And now that I know no replacing/escaping is needed, why create a file instead of just using ["sh/bash/zsh", "-c", command]? The PR seems to have an args field for ExecStream, but I don't understand why you'd need that for an already configurable command (it's always empty in the code), maybe I'm missing another feature?.

Also, I like the idea of a continuous script as well. As you said, I wouldn't implement it in the first attempt but I'll definitely take it into account. And I'll give rust-subprocess a try.

ernadh · 2021-03-31T16:22:59Z

Those are good questions -- needless to say, take my answers as reflection of the type of thinking behind the work in that PR rather than conclusive in terms of the final solution for your work.

A file allows you to support executing scripts that are multi-line, have functions defined, loops, supporting multiple commands in one stream, etc. This allows for having Tremor execute rather complex scripts (say you're hitting the API, passing a complex script and asking for it to be executed once). Nothing prevents you from passing in one command to be executed in such an onramp but designing solely for std::process::Command would require additional orchestration outside of Tremor for execution of a complex script that does not fit into std::process::Command interface.

For example, a common scenario with Telegraf's exec is wrapping multiple commands (think gathering and processing telemetry on a Linux system with eBPF) in a file and registering that as one input. The scenario implies that prior to starting our collection agent (Telegraf) we need to ensure that this particular file is present on the system, perhaps using some configuration management tool like Puppet or by manually placing it in there. We can avoid that type of work by designing Tremor's exec onramp in such a way that it "feels" closer to the underlying OS than what command execution package in Go or Rust give us.

Args indeed aren't exposed as top-level configuration option in that PR because they apply for execution of the shell command, rather than the user-provided script. Args aren't empty -- they will contain the filename and on Windows (not reflexted in that PR) they will have an additional item "/C".

marioortizmanero · 2021-03-31T23:53:07Z

A file allows you to support executing scripts that are multi-line, have functions defined, loops, supporting multiple commands in one stream, etc.

So does sh -c, sorry, I think I explained myself incorrectly. I'm just talking about simplicity, because you don't need to create a file beforehand. But this is probably not that important anyway.

ernadh · 2021-04-01T19:17:35Z

I see what you mean. That totally makes sense for the example you provided.

There is a case, though, where you do want to have the script as an executable file. Take a script like this.

Add exec onramp/offramp RFC

26ca57a

Licenser marked this pull request as draft April 17, 2020 14:17

anupdhml reviewed Apr 17, 2020

View reviewed changes

darach added active The RFC is active and the implementation is a work in progress enhancement New feature or request labels Apr 28, 2020

Licenser changed the base branch from master to main August 10, 2020 09:50

darach changed the title ~~Add exec onramp/offramp RFC~~ Add exec onramp/offramp Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `exec` onramp/offramp #20

Add `exec` onramp/offramp #20

Licenser commented Apr 16, 2020

anupdhml Apr 17, 2020

marioortizmanero commented Mar 30, 2021 •

edited

Loading

marioortizmanero commented Mar 30, 2021

ernadh commented Mar 30, 2021

marioortizmanero commented Mar 30, 2021

ernadh commented Mar 30, 2021

marioortizmanero commented Mar 31, 2021 •

edited

Loading

ernadh commented Mar 31, 2021

marioortizmanero commented Mar 31, 2021 •

edited

Loading

ernadh commented Apr 1, 2021

Add exec onramp/offramp #20

Are you sure you want to change the base?

Add exec onramp/offramp #20

Conversation

Licenser commented Apr 16, 2020

anupdhml Apr 17, 2020

Choose a reason for hiding this comment

marioortizmanero commented Mar 30, 2021 • edited Loading

marioortizmanero commented Mar 30, 2021

ernadh commented Mar 30, 2021

marioortizmanero commented Mar 30, 2021

ernadh commented Mar 30, 2021

marioortizmanero commented Mar 31, 2021 • edited Loading

ernadh commented Mar 31, 2021

marioortizmanero commented Mar 31, 2021 • edited Loading

ernadh commented Apr 1, 2021

Add `exec` onramp/offramp #20

Add `exec` onramp/offramp #20

marioortizmanero commented Mar 30, 2021 •

edited

Loading

marioortizmanero commented Mar 31, 2021 •

edited

Loading

marioortizmanero commented Mar 31, 2021 •

edited

Loading