Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[streaming] Line breaks and buffer overflows (broken pipes) #104

Open
13k opened this issue May 31, 2024 · 1 comment · May be fixed by #106
Open

[streaming] Line breaks and buffer overflows (broken pipes) #104

13k opened this issue May 31, 2024 · 1 comment · May be fixed by #106

Comments

@13k
Copy link

13k commented May 31, 2024

This is similar to #4, but it's more related to child process output content rather than output/buffer sizes.

Child processes that output a lot of data, or run long enough, but don't emit line breaks will eventually cause buffer overflows and therefore broken pipe errors.

This is easily reproducible:

package main

import (
	"fmt"
	"log"
	"math/rand/v2"
	"os"
	"time"

	"github.com/go-cmd/cmd"
)

func main() {
	if len(os.Args) > 1 {
		server()
	} else {
		client()
	}
}

func server() {
	ttl := time.Second * 10
	timeout := time.After(ttl)

	fmt.Printf("[server] start (running for %s)\n", ttl)

	for {
		n := time.Duration(rand.Int64N(10))
		step := time.After(time.Millisecond * n)

		select {
		case <-step:
			fmt.Printf("\033[2K\r[server] %d", n)
		case <-timeout:
			fmt.Println("\033[2K\r[server] exit")
			return
		}
	}

}

func client() {
	selfExe, err := os.Executable()

	if err != nil {
		panic(err)
	}

	svCmd := []string{selfExe, "server"}

	log.Printf("[client] launching server: %q", svCmd)

	cmd := cmd.NewCmdOptions(cmd.Options{Streaming: true}, selfExe, "server")

	go func() {
		for output := range cmd.Stdout {
			fmt.Fprintln(os.Stdout, output)
		}
	}()

	go func() {
		for output := range cmd.Stderr {
			fmt.Fprintln(os.Stderr, output)
		}
	}()

	log.Println("[client] wait server")

	status := <-cmd.Start()

	if status.Error != nil {
		log.Fatal(status.Error)
	}

	log.Printf("[client] server status: %v", status)
	log.Println("[client] exit")
}

You can notice that in each server loop iteration, it doesn't emit a line break to the output, it uses a ANSI control sequence (\033[2K) to clear the line and return the cursor to the start of the line (\r).

go-cmd will keep accumulating output, because it doesn't see a \n and will never send a message to the output channel. This causes two issues.

The overflow issue

Eventually go-cmd's internal buffer overflows:

> go run streaming.go
2024/05/31 14:17:18 [client] launching server: ["/tmp/go-build653253379/b001/exe/streaming" "server"]
2024/05/31 14:17:18 [client] wait server
[server] start (running for 10s)
2024/05/31 14:17:24 signal: broken pipe
exit status 1

I believe there are no buffer size adjustments that could solve this.

The issue with buffering based on assumptions about the data format

go-cmd assumes the output will be line oriented and sends channel messages for each line.

I imagine this is done for convenience, but it breaks streaming flows that could be important, like showing progress status (e.g.: rclone, aria2c, wget, curl, etc).

Even for simpler cases (ANSI control sequences are not used) or smaller cases (all of the output would fit in the buffer), for example when a process prints progress on the same line (like a series of dots or percentages), go-cmd's implementation breaks those flows (e.g.: some mkfs implementations do this if I remember correctly).

Another broken use case would be processes that emit very long lines. It could be the case for minifiers/uglyfiers/obfuscators, or language servers (LSPs) that are known to emit long JSON lines to stdio, for example. go-cmd would have to keep increasing buffer size to accomodate for this.

Proposed solution

I believe the solution to both issues is to just do byte oriented streaming with buffering: don't make assumptions about data format and emit chunks downstream whenever the buffer is filled. The client is responsible to reassemble the data if needed.

@daniel-nichter
Copy link
Member

daniel-nichter commented Jun 23, 2024

Right, go-cmd is inherently line-oriented/buffered. It's original (and still primary) use case is standard Linux command output.

A byte-oriented streaming buffer would be a nice addition. Off top of my head, the design probably needs both a buffer size and a flush timeout/interval. Let's say buff size is 1k but only 900B are received and upstream pauses for a long time. Should go-cmd wait for the remaining 100B to flush? Or flush what it's go so far (900B) after some time?

You or someone else want to write a PR? Or, if others want this new capability, 👍 the original post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants