Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/debug: add SetCrashOutput(file *os.File) #42888

Closed
egonelbre opened this issue Nov 30, 2020 · 134 comments
Closed

runtime/debug: add SetCrashOutput(file *os.File) #42888

egonelbre opened this issue Nov 30, 2020 · 134 comments

Comments

@egonelbre
Copy link
Contributor

Currently there are difficulties with nicely capturing crash output from Go programs. Crashes print to stderr, however, usually both stdout and stderr are used by programs for other purposes. While it's not wrong to output it to stderr, it mixes two outputs together making later separation more difficult.

Capturing crashes is useful for post-mortem debugging and sending reports. Of course, due to unknown crashed program state using a custom callback is understandably dangerous.

A simpler alternative would be to allow changing file descriptor where to print crashes. In the simplest form it could look like:

package main
import "runtime/debug"

func init(){
	debug.SetCrashOutputFD(3)
}

This would allow passing in a fd for separate file, pipe to another "crash monitor" process or connect to an external server. Of course there's a slight possibility of write taking a long time, when passing in a pipe/connection slowing down the crashing of the program.

With regards to naming it could also be SetCrashFD, SetTracebackFD or SetTracebackOutputFD.

@gopherbot gopherbot added this to the Proposal milestone Nov 30, 2020
@ianlancetaylor ianlancetaylor changed the title proposal: allow setting fd for crash output proposal: runtime/debug: allow setting fd for crash output Nov 30, 2020
@prattmic
Copy link
Member

As a data point, the gVisor project wants to redirect panic/throw output. It works around this problem by dup'ing over stderr with the FD of the desired crash output location (https://cs.opensource.google/gvisor/gvisor/+/master:runsc/cli/main.go;l=194-198?q=dup3&ss=gvisor%2Fgvisor). stderr is otherwise unused by the program, so this works OK. This is a workable solution, but it would certainly be nicer to be able to change the destination directly.

@mvdan
Copy link
Member

mvdan commented Nov 30, 2020

Thinking outloud, the runtime prints other output besides just crashes, such as what one can enable via GODEBUG. Should that be included in this option?

@prattmic
Copy link
Member

I'd think (initially, at least), yes. I'd probably call this something like SetRuntimeFD for any runtime-generated output.

On the other hand, there are nice opportunities if a separate "fatal" FD could be provided that only includes fatal panic/throw output, plus any related prints immediately before a throw. That would allow a nice simplification of "any bytes received on this FD indicates a crash". But it means extra complexity for a rather edge case [1].

[1] I can't think of any non-fatal runtime output that isn't behind a GODEBUG option.

@egonelbre
Copy link
Contributor Author

egonelbre commented Nov 30, 2020

I would say, yes, to including other runtime output as well. If there's a need for additional separation it can be always introduced later. But, I wouldn't be also opposed to introducing it immediately either.

I think the fatal output can be detected when the fd write closes due to the program stopping.

@alexbrainman
Copy link
Member

@egonelbre how do you propose to use your API on Windows?

I agree. I wanted this feature to capture crash dumps of a Windows service.

/cc @zx2c4

Alex

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 1, 2020

I've handled this until now with defer/recover in each go routine, which is really very subpar and doesn't handle all crashes.

Rather than setting an output fd, what about setting an output callback? There'd be certain restrictions on what it could do, obviously, but it could still probably accomplish most things that people want. In my case, it'd just copy the bytes to some mmap'd ring buffer log file.

@egonelbre
Copy link
Contributor Author

@alexbrainman it would behave in a similar manner that you can provide a fd that outputs to a file or a pipe. Flushing it to Event Log would require some external "crash reporting" service that reads that file or pipe. This isn't ideal, but would allow better handling than what can be currently done. Ideal scenario would be to write directly to event-log, however that would probably require specialized machinery.

@zx2c4 I believe the callback has been proposed a few times, however I cannot find the exact proposals. I think the fundamental issue is that you don't know really much about the crashed system - what's broken and what's not. e.g. maybe you got a panic during sigprof where some code holds a runtime lock "x", which your code needs and you'll deadlock when calling file.Write. This is all hypothetical of course.

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 1, 2020

what's not. e.g. maybe you got a panic during sigprof where some code holds a runtime lock "x", which your code needs and you'll deadlock when calling file.Write. This is all hypothetical of course.

Right. If you're using debug.SetCrashCallback, then you are signing up for that kind of fun, and it's up to to write Go code that calls the minimal set of things needed. If the runtime can already do that for spitting to a fd, so can a custom handler function.

Ideal scenario would be to write directly to event-log, however that would probably require specialized machinery.

Writing to event log isn't too bad. It'd support the callback-based approach I mentioned.
(In my case, log writes look like this: https://git.zx2c4.com/wireguard-windows/tree/ringlogger/ringlogger.go#n105 )

@prattmic
Copy link
Member

prattmic commented Dec 1, 2020

Right. If you're using debug.SetCrashCallback, then you are signing up for that kind of fun, and it's up to to write Go code that calls the minimal set of things needed. If the runtime can already do that for spitting to a fd, so can a custom handler function.

This means no heap allocation, no stack splits, no map access (IIRC), particularly if you want this to cover runtime throws. Not impossible, but a pretty high bar to set for an API.

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 1, 2020

This means no heap allocation, no stack splits, no map access (IIRC), particularly if you want this to cover runtime throws. Not impossible, but a pretty high bar to set for an API.

But also not so out of place for something in debug, right? We're offering some way to hook into runtime internals, and with that comes the responsibilities of being in the runtime. But maybe there's a better idea here:

The other way of doing this might be to have an unstable and unexported function runtime.setCrashHook, which can then be used by wrapper packages like:

//go:linkname setCrashHook runtime.setCrashHook

That then represents the lowest level. At a higher level, then the debug package can use that to implement things like debug.SetCrashFD and the Windows service package can implement things like svc.SetCrashEventlog. And then most users use these normal functions that our libraries can provide.

Meanwhile, insane users like me can dip down into runtime.setCrashHook, along with all of its dangers, for hooking it up my custom ringbuffer mmap'd logger thing, knowing full well that if anything goes wrong, this is unsupported and my fault, etc.

@ianlancetaylor
Copy link
Member

The restrictions on an output callback would be severe, as @prattmic says, and failure modes would be unpredictable. In Go we try to avoid that kind of deeply unsafe operation.

I think it would be better to keep this issue focused on the much safer operation of setting a file descriptor.

That said, I don't understand what would happen if the descriptor is a pipe that nothing is reading from. Or a file opened on a networked file system that is not responding. What should do in cases like that?

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 2, 2020

The restrictions on an output callback would be severe, as @prattmic says, and failure modes would be unpredictable. In Go we try to avoid that kind of deeply unsafe operation.

What do you think of my proposal above of allowing this to be a general callback via go:linkname -- i.e. just for libraries? That's a mere "implementation detail", but it'd go a long way of enabling this to be extensible for the adventurous.

@prattmic
Copy link
Member

prattmic commented Dec 2, 2020

That said, I don't understand what would happen if the descriptor is a pipe that nothing is reading from. Or a file opened on a networked file system that is not responding. What should do in cases like that?

I'm not sure we need to worry about those (beyond documentation, perhaps). stderr could already be any of those kinds of descriptors, as set by the parent process, so the same problems would exist today.

@egonelbre
Copy link
Contributor Author

That said, I don't understand what would happen if the descriptor is a pipe that nothing is reading from. Or a file opened on a networked file system that is not responding. What should do in cases like that?

I guess this is the main danger with the proposal. As @prattmic mentioned, somebody could pipe stderr to somewhere that isn't being read. I didn't test it, but I think that would block in the same way.

Using a non-blocking write that drops data when the fd is not responding would be nice, however I suspect that would be difficult to implement.

It'll definitely need examples how to write the listening side.

Other than that, I don't have any further ideas.

@egonelbre
Copy link
Contributor Author

This might be still difficult due to the constraints, but if the writing is on a different thread and the previous write hasn't finished in appropriate time, the write can be dropped. The threshold could be configured via GODEBUG or similar.

@rsc
Copy link
Contributor

rsc commented Dec 2, 2020

Should we just have GODEBUG=debugfd=3?

/cc @aclements

@ianlancetaylor
Copy link
Member

@zx2c4 I'm not fond of documenting a callback to use with go:linkname either. I tend to think that people who operate at that level can customize their Go distribution.

@aclements
Copy link
Member

I'd be okay with GODEBUG=debugfd=3. It's not straightforward to implement, though, since we don't distinguish between runtime output and anything else printed by print(ln). It would actually be much easier to implement just for tracebacks because we could use something like the gp.writebuf redirection, just to an FD instead. But maybe that's better anyway?

I agree with @alexbrainman that I'm not sure how this would work on Windows. Internally, we treat any "fd" other than 1 or 2 as a raw Windows handle value, but I don't know if such a thing can be meaningfully passed as an environment variable. Is it possible for a parent process to create a handle and pass it to a child like this?

On the topic of a callback, in addition to the subtle and rather onerous restrictions on the callback, those restrictions can also change with new releases. We don't expose anything remotely like that right now.

@egonelbre
Copy link
Contributor Author

egonelbre commented Dec 2, 2020

I did think about GODEBUG, but I suspect that would be complicated to use in Windows services since you don't have a nice way to figure out the fd prior to starting it. The main process needs to respond to messages (https://github.com/golang/sys/blob/ef89a241ccb347aa16709cf015e91af5a08ddd33/windows/svc/example/service.go#L23).

debug.SetCrashFD allows the main process to start a "monitor process" and set the fd accordingly, while the main process can handle the "service" responsibilities.

@alexbrainman
Copy link
Member

@alexbrainman it would behave in a similar manner that you can provide a fd that outputs to a file or a pipe. Flushing it to Event Log would require some external "crash reporting" service that reads that file or pipe. This isn't ideal, but would allow better handling than what can be currently done. Ideal scenario would be to write directly to event-log, however that would probably require specialized machinery.

@egonelbre I don't understand what you are suggesting.

If I have a Windows service written in Go, how can I redirect its crash dump to a file by using debug.SetCrashOutputFD(3) ? Let's say I want crash dump file to be called c:\a.txt.

Similarly, I don't see how using debug.SetCrashOutputFD(3) would help me write to Event Log.

Maybe we can use file path instead of file descriptor? Like debug.SetCrashOutput("/tmp/a.txt").

Alex

@aclements
Copy link
Member

@alexbrainman , for writing to a file, I imagine you would open the file and then pass that handle to debug.SetCrashOutputFD (or whatever). I don't know how the event log works, but I imagine you'd start a second process with a pipe between them and the second process would be responsible for writing to the event log.

debug.SetCrashFD allows the main process to start a "monitor process" and set the fd accordingly, while the main process can handle the "service" responsibilities.

Is there a reason this can't be done the other way around? The first process is the monitor process and opens the FD, then starts the second process with the GODEBUG environment set. Does this not work for Windows services?

(I'm not necessarily opposed to having a runtime/debug function for this, just trying to understand the space of constraints. One nice thing about the environment variable is that it can catch very early panics, like during package init.)

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 2, 2020

@zx2c4 I'm not fond of documenting a callback to use with go:linkname either. I tend to think that people who operate at that level can customize their Go distribution.

Yea, documenting it is probably the wrong thing to do. But having something in there akin to nanotime would be "very nice"...and would allow me to implement something for Go's Windows service package that uses it to ship things to eventlog automatically.

Anyway, I realize pushing this too hard is an loosing battle, for good reason. But if somehow the implementation happened to be structured in a way that made it coincidentally possible to go:linkname it from elsewhere, I would be very happy.

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 2, 2020

Is it possible for a parent process to create a handle and pass it to a child like this?

You can make a handle inheritable, and then pass its value as an argument to the new process. The value will remain the same in the new process. Alternatively, DuplicateHandleEx allows the one process to duplicate a handle into another process.

@egonelbre
Copy link
Contributor Author

@alexbrainman, so roughly what I'm thinking is the following. However, I might also forget some internal details that could make the following difficult with Windows.

The most basic thing is to send the output to a separate file:

package main

import (
	"os"
	"runtime/debug"
)

func main() {
	f, err := os.Open("service_log.txt")
	if err != nil {
		os.Exit(1)
		return
	}

	debug.SetCrashFD(f.Fd())

	// rest of the service logic
}

The next step would be to create a watchdog sub-process:

package main

import (
	"os"
	"os/exec"
	"runtime/debug"
)

func main() {
	r, w, err := os.Pipe()
	if err != nil {
		os.Exit(1)
		return
	}

	// this could also self-start, but in a different mode
	// but we'll use a separate binary for now
	cmd := exec.Command("watchdog.exe") 
	cmd.Stdin = r
	// also set other flags here to ensure that the watchdog
	// doesn't immediately close together with the parent.
	cmd.Start()

	debug.SetCrashFD(w.Fd())

	// rest of the service logic
}

You could also create governors using ExtraFiles:

package main

import (
	"os"
	"os/exec"
	"runtime/debug"
)

func main() {
	r, w, err := os.Pipe()
	if err != nil {
		os.Exit(1)
		return
	}

	cmd := exec.Command("watchdog.exe") 
	cmd.ExtraFiles = []*os.File{w}
	cmd.Start()

	// monitor `r`
}

// The other process has `debug.SetCrashFD(3)` in init,
// with some check that `3` actually exists.

For windows, I think (named) pipes could also be used:

f, err := os.Open(`\\.\pipe\randomhexvalue`)
...
debug.SetCrashFD(f.Fd())

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/547978 mentions this issue: runtime/debug: SetCrashOutput sets the FD for fatal panics

@adonovan adonovan self-assigned this Dec 7, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/548121 mentions this issue: runtime/debug: add Example use of SetCrashOutput for a watchdog

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/548735 mentions this issue: gopls/internal/lsp/protocol: report panics in RPC goroutines

gopherbot pushed a commit to golang/tools that referenced this issue Dec 11, 2023
This change adds recover() + bug.Reportf to the generated
dispatch routines for the server and client, so that we learn
through telemetry of unexpected panics in RPC goroutines.

Panics in other goroutines cannot yet be reported, but go1.23's
runtime/debug.SetCrashOutput feature (golang/go#42888) will
make that possible, with some extra work.

Change-Id: I893333d296096632fa47e3238a72b0ac48386375
Reviewed-on: https://go-review.googlesource.com/c/tools/+/548735
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Robert Findley <[email protected]>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/558256 mentions this issue: gopls: a watchdog for unexpected crashes

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/559503 mentions this issue: crashmonitor: a monitor for unexpected crashes

@adonovan adonovan modified the milestones: Backlog, Go1.23 Jan 31, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/559800 mentions this issue: runtime/debug: clarify SetCrashOutput dup behavior

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/559801 mentions this issue: runtime: be careful about crash FD changes during panic

gopherbot pushed a commit to golang/telemetry that referenced this issue Feb 1, 2024
This change defines a crash monitor that increments
a telemetry counter for the stack any time the Go
runtime crashes.

It relies on the runtime/debug.SetCrashOutput feature
of go1.23 (#42888); if built with an earlier
version of Go, the monitor is a no-op. This feature
causes the Go runtime to write the crash to an arbitrary
file instead of stderr.

In this case, the file is a pipe to a subprocess that
executes the application executable in monitor mode.
When the pipe is closed, the monitor parses the
program counters out of the backtrace, then uses a
the usual logic to convert it to the name of a
counter, which it then increments.

The CL includes two tests, an internal test that parses
a crash from the child process's stderr, and an
integration test that uses crashmonitor.Start and
SetCrashOutput, if available.

Updates golang/go#42888

Change-Id: Ie40f9403fa99a9e0bab3b4edc9430be5e18150d7
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/559503
Auto-Submit: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Robert Findley <[email protected]>
gopherbot pushed a commit to golang/tools that referenced this issue Feb 2, 2024
This change enables the crash monitor in gopls, so that it
increments a telemetry counter for the stack any time
the Go runtime crashes.

x/telemetry cannot be built with go1.18, which gopls still
supports (though not for long). So we must ensure that there
are no direct references to x/telemetry from gopls; all
references must go through the build-tagged non-functional
shims, which avoids the dependency.

Also, internal/telemetry can no longer depend on protocol,
to avoid a cycle via protocol -> bug -> telemetry.

Updates golang/go#42888

Change-Id: I7a57b9a93ab76a58ec6dd73f895d4b42ed29ea86
Reviewed-on: https://go-review.googlesource.com/c/tools/+/558256
Reviewed-by: Robert Findley <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Auto-Submit: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 2, 2024
Updates #42888

Change-Id: I72e408301e17b1579bbea189bed6b1a0154bd55f
Reviewed-on: https://go-review.googlesource.com/c/go/+/548121
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
This feature makes it possible to record unhandled panics
in any goroutine through a watchdog process (e.g. the same
application forked+exec'd as a child in a special mode)
that can process the panic report, for example by sending
it to a crash-reporting system such as Go telemetry
or Sentry.

Fixes golang#42888

Change-Id: I5aa7be8f726bbc70fc650540bd1a14ab60c62ecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/547978
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Auto-Submit: Alan Donovan <[email protected]>
Reviewed-by: Russ Cox <[email protected]>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Updates golang#42888

Change-Id: I72e408301e17b1579bbea189bed6b1a0154bd55f
Reviewed-on: https://go-review.googlesource.com/c/go/+/548121
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 21, 2024
SetCrashOutput dup's the input file for safety, but I don't think that
the docs are very clear about what the caller can/should do with f. "it
does not close the previous file" is particularly confusing, as it does
close the previous FD (but not the previous passed os.File).

Expand and attempt to clarify the explanation, borrowing wording from
net.FileConn, which also dup's the input os.File.

For #42888.

Change-Id: I1c96d2dce7899e335d8f1cd464d2d9b31aeb4e5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/559800
Reviewed-by: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 22, 2024
There are two separate cases here:

The base case is simple: a concurrent call to SetCrashOutput while
panicking will switch the crash FD, which could cause the first half of
writes to go to the old FD, and the second half to the new FD. This
isn't a correctness problem, but would be annoying to see in practice.
Since it is easy to check for, I simply drop any changes if panicking is
already in progress.

The second case is more important: SetCrashOutput will close the old FD
after the new FD is swapped, but writeErrData has no locking around use
of the fd, so SetCrashOutput could close the FD out from under
writeErrData, causing lost writes. We handle this similarly, by not
allowing SetCrashOutput to close the old FD if a panic is in progress,
but we have to be more careful about synchronization between
writeErrData and setCrashFD to ensure that writeErrData can't observe
the old FD while setCrashFD allows close.

For #42888.

Change-Id: I7270b2cc5ea58a15ba40145b7a96d557acdfe842
Reviewed-on: https://go-review.googlesource.com/c/go/+/559801
Reviewed-by: Alan Donovan <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@acastruc
Copy link

acastruc commented Oct 7, 2024

@adonovan I had been watching for this API to become available, though having used it I'm a bit disappointed.
Perhaps I didn't understand the use case: I had wanted my process to be able to catch crash output should one occur.
So I can keep an empty "crash.log" file around, that only gets content when a crash occurs. However, there's no date time data output. How would I cross reference when the crash occurred?
Also, the way it's implemented I cannot remove said crash file if nothing gets written. (I can on mac/Linux but not on Windows because it thinks the file handle is in use)
See my simple example below

var crashLog string
func main() {
	crashLog, _ = filepath.Abs("crash.log")
	f,err := os.OpenFile(crashLog, os.O_TRUNC|os.O_CREATE|os.O_WRONLY, os.ModePerm)
	if err != nil {
		fmt.Println("Error creating file:", err)
		return
	}
	fmt.Println("Created ", crashLog)
	debug.SetCrashOutput(f, debug.CrashOptions{})
	f.Close()

	//simulate crash
	//panic("ERROR")

	//We will only get here if there's no crash...
	cleanupCrashlog()
}
func cleanupCrashlog() {
	stat, err := os.Stat(crashLog)
	if err == nil {
		if stat.Size() == 0 {
			err := os.Remove("crash.log") //fails on Windows, because of win file handling
			if err != nil {
				fmt.Fprint(os.Stderr, "Failed to remove crash log: %v", err)
			} else {
				fmt.Println("Deleted crash log")
			}
		}
	}
}

Can I suggest we either

  1. write some datetime indicators in the crashoutput AND/OR
  2. make a CrashOptions field that users can specify "removeEmptyFile".
    Otherwise Im forced to create a crashlog file for every session (where 99% will be empty)
    or use a single crashlog where all crashes go but without any date time indicators.

Or am I just missing something? I have no need for a separate monitor process - I think a process should be able to catch its own crash output.

@adonovan
Copy link
Member

adonovan commented Oct 7, 2024

write some datetime indicators in the crash output

The file modification time records the time of the last write to the file, so you should be able to infer the crash time from it.

make a CrashOptions field that users can specify "removeEmptyFile".
Otherwise I'm forced to create a crashlog file for every session (where 99% will be empty)

Indeed, you will need to create a file for every process, and (one hopes) the vast majority will be empty. Presumably you will create them in a directory other than /tmp (where their directory entries would expire even while the process is still running, at least on Unix). Whatever program of yours monitors this directory can ignore the empty files and clean up however it sees fit.

Also, your cleanupCrashlog function is allowed to call SetCrashOutput(nil), after which it should be possible to remove the log file even on Windows.

@acastruc
Copy link

acastruc commented Oct 7, 2024

@adonovan

The file modification time records the time of the last write to the file, so you should be able to infer the crash time from it.

For the last crash only
See my scenario is an application that runs periodically. We want to be able to go back and see if there were any crashes at a later time. Ideally we could look in an application specific directory and just count the number of files to see how many crashes occurred. Empty directory == all good.

you will need to create a file for every process, and (one hopes) the vast majority will be empty.

I really do feel though this feature missed the mark. At least if SetCrashOutput() received an interface instead of os.File I feel like there'd be more opportunity to customize the behaviour. As you say, I suppose my process will have to self prune these empty crash logs...

Also, your cleanupCrashlog function is allowed to call SetCrashOutput(nil), after which it should be possible to remove the log file even on Windows.

There it is! I can make that work - thank you!

@adonovan
Copy link
Member

adonovan commented Oct 7, 2024

Empty directory == all good.

The directory is never going to be entirely empty because each running process will need a corresponding file. So either way, you would need to change this predicate to "Directory of only empty files == all good".

I really do feel though this feature missed the mark. At least if SetCrashOutput() received an interface instead of os.File I feel like there'd be more opportunity to customize the behaviour.

There's no way that the runtime can make a dynamic call through an interface while in the middle of a crashing. Observe that the implementation must write directly to a file descriptor; it doesn't use os.File.Write. That's the reason it's not an io.Writer.

@acastruc
Copy link

acastruc commented Oct 7, 2024

Thank you for the responses, I appreciate your time on this.

Empty directory == all good.

Sorry, I should have been clearer. I was referring to a manual, visible inspection of said directory. Which I think you'll agree is still easier to do on a single file (that's required for a running process) vs 1000s of leftover files

There's no way that the runtime can make a dynamic call through an interface while in the middle of a crashing.

I still think this makes a case for a future CrashOptions{} field to remove the crash file if it is empty.
I feel like it will be a common requirement where devs want to indicate "put crash info in log file X that exists iff a crash occurs".
Anyway, just my two cents, I differ to your expertise.
Thanks again.

@adonovan
Copy link
Member

adonovan commented Oct 7, 2024

I was referring to a manual, visible inspection of said directory. Which I think you'll agree is still easier to do on a single file (that's required for a running process) vs 1000s of leftover files

Agreed, but that sounds like a tedious manual job best automated and delegated to a computer. ;-) And there's no getting around the need for each running process to have an empty file.

If the program acted as its own crash monitor (using a pipe to a child process---see ExampleSetCrashOutput), it could defer the opening of a file until the monitored program actually crashed.

There's no way that the runtime can make a dynamic call through an interface while in the middle of a crashing.

I still think this makes a case for a future CrashOptions{} field to remove the crash file if it is empty. I feel like it will be a common requirement where devs want to indicate "put crash info in log file X that exists iff a crash occurs". Anyway, just my two cents, I differ to your expertise. Thanks again.

There's no way for the runtime to implement that behavior. It would require the runtime to somehow know that the process was never going to crash. The only code that knows that is your main function just before it calls os.Exit(0), assuming there are no other goroutines running. Even then, many long lived programs that you might want to monitor never voluntarily terminate: healthy web servers are often killed by the termination of their container.

Pinkle-pash added a commit to Pinkle-pash/telemetry that referenced this issue Dec 4, 2024
This change defines a crash monitor that increments
a telemetry counter for the stack any time the Go
runtime crashes.

It relies on the runtime/debug.SetCrashOutput feature
of go1.23 (#42888); if built with an earlier
version of Go, the monitor is a no-op. This feature
causes the Go runtime to write the crash to an arbitrary
file instead of stderr.

In this case, the file is a pipe to a subprocess that
executes the application executable in monitor mode.
When the pipe is closed, the monitor parses the
program counters out of the backtrace, then uses a
the usual logic to convert it to the name of a
counter, which it then increments.

The CL includes two tests, an internal test that parses
a crash from the child process's stderr, and an
integration test that uses crashmonitor.Start and
SetCrashOutput, if available.

Updates golang/go#42888

Change-Id: Ie40f9403fa99a9e0bab3b4edc9430be5e18150d7
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/559503
Auto-Submit: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Robert Findley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests