Add log group headers and timestamps to job verification success and failure logs #2461

triarius · 2023-10-30T03:59:23Z

I had to also add a header to the Job API logs as otherwise they would appear to be part of the verification log group.
There are probably other such logs like this, we need to weed them out eventually.

I also simplified some of the logging, in particular, the signature is not printed as part of the log group header, but in a subsequent line.
As the log group will be collapsed unless there is a failure and the agent is configured to block, the signature will not be immediately visible.

Also, for the agent logs, I tried to add fields using the logger.WithFields method. These will be displayed in the logfmt format (or json if configured to do so). This will make the logs more structured.

Finally, I've removed the use of (*JobRunner).logStreamer.Process to send logs directly to the API as it bypasses various transformations that happen before (like timestamping). Users will still be able to call logStreamer.Process, but should use (*JobRunner).jobLogs, which exposes an io.Writer interface, instead.

Screenshots

Verification Failure Warning

Before

After

Verification Failure Block

Before

After

Verification Success

Before

After

agent/run_job.go

moskyb · 2023-10-30T04:31:45Z

agent/run_job.go

@@ -148,6 +157,50 @@ func (r *JobRunner) Run(ctx context.Context) error {
 	return nil
 }

+func (r *JobRunner) prependTimestampForLogs(s string, args ...any) []byte {


right now this func both prepending stuff and also sprintf-ing its args... is there a reason that it has to happen all in one go? or can we sprintf prior to passing something to this func?

It just seems easier to call one method rather than 2. If you want to sprintf prior to this method, then this interface does not prevent that. But if you don't want to, the suggested interface does.

But I should rename the method, its name only suggests the prepending, not the sprintfing.

agent/run_job.go

moskyb · 2023-10-30T04:36:12Z

jobapi/server.go

-	s.Logger.Commentf("Job API server listening on %s", s.SocketPath)
+	s.Logger.Printf("~~~ Job API")
+	s.Logger.Printf("Server listening on %s", s.SocketPath)


👩‍🍳💋

agent/run_job.go

DrJosh9000 · 2023-10-30T04:52:15Z

agent/run_job.go

+			time.Now().UnixNano()/int64(time.Millisecond),
+			fmt.Sprintf(s, args...),
+		))
+	case r.conf.AgentConfiguration.TimestampLines:


Can we have a single stream for processing job output + stuff the job runner wants to pretend is job output? Aside from duplicating the timestamping logic, pushing logs directly into r.logStreamer could make header times incorrect with non-ANSI timestamps (skipping the header times streamer), and might also skip the local log output file?

I don't think we can have a single stream. The verification logs happen in the agent start process, while most of the other logs are streamed from the agent bootstrap process. So I think the best we can do is have two streams that are created from a common method.

We could have something in the local API so that the bootstrap process can stream logs over the local API to the remote agent API, but creating that for this task seems like a massive yak shave.

pushing logs directly into r.logStreamer could make header times incorrect with non-ANSI timestamps
I don't really want to carry around the legacy of non-ANSI timestamps. Are you happy for me to remove support for these from this PR?

might also skip the local log output file?
Yeah, they probably will. I'll look into something for this, and a way to not duplicate the timestamp logic, but if the answer to the question above is in the affirmative, then only about 1 line will be duplicated.

I don't think we can have a single stream. The verification logs happen in the agent start process, while most of the other logs are streamed from the agent bootstrap process. So I think the best we can do is have two streams that are created from a common method.

Looks like I was wrong about this. I'll do the work to make a single stream.

triarius · 2023-11-01T13:11:38Z

agent/job_runner.go

-	// The logger to use
-	logger logger.Logger
+	// agentLogger is a agentLogger that outputs to the agent logs
+	agentLogger logger.Logger


To reduce confusion between agent logs and job logs, I renamed the logger to make it clear that it is not the job logs.

triarius · 2023-11-01T13:11:41Z

agent/job_runner.go

@@ -131,9 +131,12 @@ type JobRunner struct {
 	// The internal header time streamer
 	headerTimesStreamer *headerTimesStreamer

-	// The internal log streamer
+	// The internal log streamer. Don't write to this directly, use `jobLogs` instead


I toyed with ways to enforce this, but ultimately they proved too invasive.

triarius · 2023-11-01T13:11:59Z

agent/log_streamer.go

@@ -102,7 +106,7 @@ func (ls *LogStreamer) FailedChunks() int {
 }

 // Process streams the output.
-func (ls *LogStreamer) Process(output []byte) error {
+func (ls *LogStreamer) Process(output []byte) {


The method always returns nil, and is not part of an interface.

triarius · 2023-11-01T13:12:05Z

agent/log_streamer.go

 }

 // Waits for all the chunks to be uploaded, then shuts down all the workers
-func (ls *LogStreamer) Stop() error {
+func (ls *LogStreamer) Stop() {


agent/run_job.go

triarius · 2023-11-01T13:14:47Z

agent/run_job.go

@@ -204,6 +242,10 @@ func (r *JobRunner) runJob(ctx context.Context) processExit {
 func (r *JobRunner) cleanup(ctx context.Context, wg *sync.WaitGroup, exit processExit) {
 	finishedAt := time.Now()

+	// Flush the job logs. These should have been flushed already if the process started, but if it
+	// never started, then logs from prior to the attempt to start the process will still be buffered.
+	r.logStreamer.Process(r.output.ReadAndTruncate())


Realising that we need to do this took too long.

triarius · 2023-11-01T13:15:26Z

agent/run_job.go

 // every few seconds and sends it back to Buildkite.
-func (r *JobRunner) jobLogStreamer(ctx context.Context, wg *sync.WaitGroup) {
+func (r *JobRunner) streamJobLogsAfterProcessStart(ctx context.Context, wg *sync.WaitGroup) {


Renamed this method to make it clearer that it waits for the process to start before shipping logs.

triarius · 2023-11-01T13:16:00Z

agent/run_job.go

-			r.output.Close()
-		}
+		// Send the output of the process to the log streamer for processing
+		r.logStreamer.Process(r.output.ReadAndTruncate())


Because r.logStreamer.Process can never error, this is greatly simplified.

We can easily resurrect the error return once we start enforcing job log limits.

agent/job_runner.go

moskyb

awesome work!

moskyb · 2023-11-01T22:55:35Z

agent/job_runner.go

+	// jobLogs is an io.Writer that sends data to the job logs
+	jobLogs io.Writer
+


DrJosh9000

☘️

… timestamps

Co-authored-by: Ben Moskovitz <[email protected]>

…reamer directly

triarius marked this pull request as draft October 30, 2023 04:11

triarius marked this pull request as ready for review October 30, 2023 04:17

triarius requested a review from a team October 30, 2023 04:17

moskyb requested changes Oct 30, 2023

View reviewed changes

DrJosh9000 reviewed Oct 30, 2023

View reviewed changes

triarius force-pushed the pdp-1572-improve-build-log-messaging-for-verification-successfail branch from a8f8148 to 86f16ad Compare November 1, 2023 13:04

triarius commented Nov 1, 2023

View reviewed changes

agent/job_runner.go Outdated Show resolved Hide resolved

triarius force-pushed the pdp-1572-improve-build-log-messaging-for-verification-successfail branch from c1a85e9 to 4eaf8ad Compare November 1, 2023 21:41

triarius requested review from DrJosh9000 and moskyb November 1, 2023 21:41

moskyb approved these changes Nov 1, 2023

View reviewed changes

DrJosh9000 approved these changes Nov 1, 2023

View reviewed changes

triarius and others added 13 commits November 2, 2023 11:45

Header for verification fail message

aafe19b

Consolidate verification failure logging into a single method and add…

4c74422

… timestamps

Add timestamps to verification and log group headers to success message

1a76340

Add log group header to Job API logs

1095f5a

Specify that it's the signature that's being verified

6df583c

Co-authored-by: Ben Moskovitz <[email protected]>

Expand warnings too

0e7e61c

Co-authored-by: Ben Moskovitz <[email protected]>

Use an io.Writer to write to the combined stream instead of the logSt…

52d8825

…reamer directly

Remove error return value from a method that does not use it

fef9127

Flush job logs when the job never started

f1c84c5

Update tests for new error messages

aed1669

Fix typos in comments

abbf57c

Consolidate jobLog flushes into the cleanup method

2dfc6f8

Stop the log stream before the header time streamer

4619a37

triarius force-pushed the pdp-1572-improve-build-log-messaging-for-verification-successfail branch from 94cf78d to 4619a37 Compare November 2, 2023 00:50

triarius merged commit 5a43e6f into main Nov 2, 2023

triarius deleted the pdp-1572-improve-build-log-messaging-for-verification-successfail branch November 2, 2023 01:04

triarius mentioned this pull request Nov 2, 2023

Bump version and changelog for v3.58.0 #2470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add log group headers and timestamps to job verification success and failure logs #2461

Add log group headers and timestamps to job verification success and failure logs #2461

triarius commented Oct 30, 2023 •

edited

Loading

moskyb Oct 30, 2023

triarius Oct 31, 2023

moskyb Oct 30, 2023

DrJosh9000 Oct 30, 2023 •

edited

Loading

triarius Oct 31, 2023

triarius Oct 31, 2023

triarius Nov 1, 2023

triarius Nov 1, 2023

triarius Nov 1, 2023

triarius Nov 1, 2023

triarius Nov 1, 2023

triarius Nov 1, 2023

triarius Nov 1, 2023

DrJosh9000 Nov 1, 2023

moskyb left a comment

moskyb Nov 1, 2023

DrJosh9000 left a comment

		// jobLogs is an io.Writer that sends data to the job logs
		jobLogs io.Writer

Add log group headers and timestamps to job verification success and failure logs #2461

Add log group headers and timestamps to job verification success and failure logs #2461

Conversation

triarius commented Oct 30, 2023 • edited Loading

Screenshots

Verification Failure Warning

Before

After

Verification Failure Block

Before

After

Verification Success

Before

After

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrJosh9000 Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moskyb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrJosh9000 left a comment

Choose a reason for hiding this comment

triarius commented Oct 30, 2023 •

edited

Loading

DrJosh9000 Oct 30, 2023 •

edited

Loading