Automatically reattempt after internal errors #1424

nicktrn · 2024-10-22T09:54:10Z

Other changes:

Better error detection for ffmpeg OOM, segfault
Unify failed and crashed run logic via complete attempt service
The retry config stored for tasks now respects the global retry.default
Fix timeline event color for failed runs
Container cleanup after fatal errors
Retry fetch for large payload packet import and export

Things to test:

Summary by CodeRabbit

New Features
- Enhanced error handling and status evaluation in the Run Inspector component.
- Improved task run attempt creation with structured parameter handling.
- Added retry mechanisms for failed task runs and crash scenarios.
- New constants and functions for better task status categorization.
- Introduced a new function for safe JSON parsing from API responses.
- Enhanced task configuration handling with default retry settings.
Bug Fixes
- Streamlined error logging and handling in various services.
Documentation
- Updated package dependencies for improved functionality.
Style
- Improved code clarity and structure across multiple components.
Tests
- Enhanced error handling logic for more robust task execution.

* catch all the remaining socket acks that could possibly throw * wrap the remaining handlers in try catch

* Updated “Twitter” to be “X (Twitter)” * added Textarea to storybook * Updated textarea styling to match input field * WIP adding new text field to org creation page * Added description to field * Submit feedback to Plain when an org signs up * Formatting improvement * type improvement * removed userId * Moved submitting to Plain into its own file * Change orgName with name * use sendToPlain function for the help & feedback email form * use name not orgName * import cleanup * Downgrading plan form uses sendToPlain * Get the userId from requireUser only * Added whitespace-pre-wrap to the message property on the run page * use requireUserId * Removed old Plain submit code

* Added a new context page with task context properties * Removed code comments * Added more crosslinks

…lly (#1417) * doc: add instructions to create new reference project and run it locally * doc: Add instruction for running tunnel * minor language improvement

* try to correct resume messages with missing checkpoint * prevent creating checkpoints for outdated task waits * prevent creating checkpoints for outdated batch waits * use heartbeats to check for and clean up any leftover containers * lint * improve exec logging * improve resume attempt logs * fix for resuming parents of canceled child runs * separate SIGTERM from maybe OOM errors * pretty errors can have magic dashboard links * prevent uncancellable checkpoints * simplify task run error code enum export * grab the last, not the first child run * Revert "prevent creating checkpoints for outdated batch waits" This reverts commit f2b5c2a. * Revert "grab the last, not the first child run" This reverts commit 89ec5c8. * Revert "prevent creating checkpoints for outdated task waits" This reverts commit 11066b4. * more logs for resume message handling * add magic error link comment * add changeset

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

changeset-bot · 2024-10-22T09:54:13Z

🦋 Changeset detected

Latest commit: d7b246e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
trigger.dev	Patch
@trigger.dev/core	Patch
@trigger.dev/build	Patch
@trigger.dev/sdk	Patch
@internal/redis-worker	Patch
@internal/zod-worker	Patch
@trigger.dev/react-hooks	Patch
@trigger.dev/database	Patch
@trigger.dev/otlp-importer	Patch
@internal/testcontainers	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2024-10-22T09:54:19Z

Walkthrough

The pull request introduces several enhancements across multiple components related to task run management, focusing on improving error handling and run status evaluation. Key changes include the addition of new status checks and error logging mechanisms in the RunInspector, SpanPresenter, and FailedTaskRunService components. The CreateTaskRunAttemptService and FinalizeTaskRunService have been updated to support more flexible parameter handling and refined error management. Additionally, new constants and functions for categorizing task statuses have been introduced, enhancing the overall structure and clarity of task execution workflows.

Changes

File Path	Change Summary
apps/webapp/app/components/runs/v3/RunInspector.tsx	Enhanced run status handling with new `isError` variable; improved error reporting in `RunError` function.
apps/webapp/app/presenters/v3/SpanPresenter.server.ts	Added `isError` property to `call` method for run status evaluation.
apps/webapp/app/routes/api.v1.runs.$runParam.attempts.ts	Refactored parameter handling in `action` function for service call.
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx	Updated `RunTimelineEvent` to reflect dynamic error state in timeline.
apps/webapp/app/v3/failedTaskRun.server.ts	Introduced `FailedTaskRunRetryHelper` class; enhanced retry handling and logging in `FailedTaskRunService`.
apps/webapp/app/v3/handleSocketIo.server.ts	Updated `CREATE_TASK_RUN_ATTEMPT` handler to accept structured parameters; improved error logging.
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts	Enhanced task run attempt handling; refined error checking and logging.
apps/webapp/app/v3/services/completeAttempt.server.ts	Added parameters for nuanced error handling; improved retry logic with `FailedTaskRunRetryHelper`.
apps/webapp/app/v3/services/crashTaskRun.server.ts	Enhanced logging and retry mechanism for crashed task runs.
apps/webapp/app/v3/services/createTaskRunAttempt.server.ts	Updated `call` method to accept structured parameters for flexibility.
apps/webapp/app/v3/services/finalizeTaskRun.server.ts	Improved handling of fatal run statuses; refined error handling logic.
apps/webapp/app/v3/taskStatus.ts	Restructured run and attempt status constants and functions for clarity.
apps/webapp/package.json	Added new dependencies for version management.
internal-packages/database/prisma/schema.prisma	Expanded enums for task run statuses to include new states.
packages/cli-v3/src/entryPoints/deploy-index-worker.ts	Enhanced task configuration handling with new type safety.
packages/cli-v3/src/entryPoints/dev-index-worker.ts	Improved task management logic with default retry settings.
packages/core/src/v3/errors.ts	Added `shouldRetryError` function for enhanced error handling.
packages/core/src/v3/schemas/common.ts	Expanded `TaskRunInternalError` schema with new error codes.
packages/core/src/v3/utils/ioSerialization.ts	Introduced retry logic for network requests using `zodfetch`.
packages/core/src/v3/workers/taskExecutor.ts	Enhanced error handling in `execute` method for payload import and parsing.

Suggested reviewers

ericallam
matt-aitken

🐇 In the garden, where the changes bloom,
A rabbit hops, dispelling gloom.
With statuses clear and errors refined,
Task runs now dance, perfectly aligned.
Let’s celebrate this code delight,
Hopping through the day and into the night! 🌼✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 13

🧹 Outside diff range and nitpick comments (14)

apps/webapp/app/routes/api.v1.runs.$runParam.attempts.ts (2)
32-35: LGTM! Consider destructuring for consistency.

The refactoring to use an object parameter for the CreateTaskRunAttemptService.call() method improves code structure and aligns with the service updates. This change enhances readability and maintainability.

For consistency with the destructuring used in the function parameters, consider applying this minor change:
-    const { execution } = await service.call({
-      runId: runParam,
-      authenticatedEnv: authenticationResult.environment,
-    });
+    const { execution } = await service.call({
+      runId: runParam,
+      authenticatedEnv: authenticationResult.environment,
+    });
This small adjustment maintains a consistent style throughout the function.

Line range hint 1-53: Overall, the changes improve code structure without altering core functionality.

The refactoring of the CreateTaskRunAttemptService.call() method parameter aligns well with the broader changes in error handling and status evaluation mentioned in the PR objectives. The existing error handling remains robust, ensuring that the function continues to handle invalid API keys and malformed parameters appropriately.

As the system evolves, consider implementing a more granular error handling mechanism that can distinguish between different types of internal errors (e.g., OOM errors, segfaults) as mentioned in the PR objectives. This could involve passing more detailed error information from the service layer to the API response, allowing clients to handle specific error cases more effectively.
packages/cli-v3/src/entryPoints/dev-index-worker.ts (1)
103-115: LGTM: Default retry configuration logic added

The new logic for applying default retry settings to tasks is well-implemented and aligns with the PR objectives. It correctly handles cases where tasks may not have retry settings defined.

A minor optimization suggestion:

Consider using the nullish coalescing operator (??) to simplify the logic:
tasks = tasks.map((task) => ({
  ...task,
  retry: task.retry ?? config.retries?.default,
}));
This approach would eliminate the need for the nested if statement while achieving the same result.
apps/webapp/app/v3/services/createTaskRunAttempt.server.ts (2)
15-25: Improved method signature with object parameter.

The change to use an object parameter enhances flexibility and readability. The addition of the startAtZero parameter provides more control over the initial attempt number, which aligns well with the PR objectives for improving error handling and retry mechanisms.

Consider adding JSDoc comments to describe the purpose and behavior of the new startAtZero parameter for better documentation.

111-115: Enhanced logic for determining nextAttemptNumber.

The updated logic for calculating nextAttemptNumber now considers the startAtZero parameter, providing more flexibility in managing task run attempts. This change aligns well with the PR objectives for improving error handling and retry mechanisms.

Consider using a more explicit conditional expression for better readability:
const nextAttemptNumber = taskRun.attempts[0]
  ? taskRun.attempts[0].number + 1
  : (startAtZero ? 0 : 1);
This minor change could make the logic more immediately clear to other developers.
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)
297-297: LGTM: Addition of isError property enhances error detection.

The new isError property, utilizing isFailedRunStatus(run.status), provides valuable information about the run's error state. This addition aligns well with the PR's goal of improving error detection and handling.

For improved clarity and maintainability, consider extracting this logic into a separate method within the class. For example:
private isErrorStatus(status: string): boolean {
  return isFailedRunStatus(status);
}

// Then in the return statement:
isError: this.isErrorStatus(run.status),
This approach would make it easier to modify or extend the error detection logic in the future if needed.
apps/webapp/app/v3/handleSocketIo.server.ts (1)
196-200: LGTM! Consider adding a comment for clarity.

The update to use an object parameter in the CreateTaskRunAttemptService call is a good improvement. It aligns with the PR objectives and enhances code flexibility.

Consider adding a brief comment explaining why setToExecuting is set to false here, as it might not be immediately obvious to future readers. For example:
const { attempt } = await service.call({
  runId: message.runId,
  authenticatedEnv: environment,
  setToExecuting: false, // Execution state will be set later when retrieving the payload
});
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (2)
860-860: Improved error state handling in RunTimelineEvent

The change enhances the RunTimelineEvent component by dynamically setting the state based on whether the run has encountered an error. This improvement provides more accurate visual feedback to users about the run's status.

Consider extracting the state logic into a separate function for better readability and maintainability. For example:
const getEventState = (isError: boolean) => isError ? "error" : "complete";

<RunTimelineEvent
  title="Finished"
  subtitle={<DateTimeAccurate date={run.updatedAt} />}
  state={getEventState(run.isError)}
/>
Line range hint 848-862: Enhance type safety in RunTimeline component

While the changes look good, consider improving type safety in the RunTimeline component to reduce potential runtime errors.

Consider using TypeScript's non-null assertion operator or optional chaining to handle potential undefined values. For example:
{run.isFinished ? (
  <>
    <RunTimelineLine
      title={formatDuration(run.startedAt!, run.updatedAt)}
      state={"complete"}
    />
    <RunTimelineEvent
      title="Finished"
      subtitle={<DateTimeAccurate date={run.updatedAt} />}
      state={run.isError ? "error" : "complete"}
    />
  </>
) : (
  <RunTimelineLine
    title={
      <span className="flex items-center gap-1">
        <Spinner className="size-4" />
        <span>
          <LiveTimer startTime={run.startedAt!} />
        </span>
      </span>
    }
    state={"inprogress"}
  />
)}
This change assumes that run.startedAt is always defined when run.isFinished is true. If this assumption is not always true, consider adding additional checks or using optional chaining (run.startedAt?.).
packages/cli-v3/src/entryPoints/deploy-index-worker.ts (1)
110-110: Re-evaluate the necessity of satisfies TaskManifest assertions

The satisfies TaskManifest assertion is used multiple times when returning modified task objects. If the objects already conform to TaskManifest, these assertions might be redundant.

Consider removing the assertions if they are unnecessary:
-          } satisfies TaskManifest;
+          };
Also applies to: 126-126, 140-140
packages/core/src/v3/workers/taskExecutor.ts (2)
Line range hint 351-476: Avoid variable shadowing of 'delay' within nested scopes.

In the block starting at line 456, the variable delay is redefined inside a nested scope. This can lead to confusion and potential errors. Consider renaming the inner delay variable to improve code clarity.

Apply this diff to rename the variable:
 if (handleErrorResult.retry && typeof handleErrorResult.retry === "object") {
-  const delay = calculateNextRetryDelay(handleErrorResult.retry, execution.attempt.number);
+  const nextDelay = calculateNextRetryDelay(handleErrorResult.retry, execution.attempt.number);
 
   return typeof nextDelay === "undefined"
     ? { status: "noop", error: handleErrorResult.error }
     : {
         status: "retry",
-        retry: { timestamp: Date.now() + delay, delay },
+        retry: { timestamp: Date.now() + nextDelay, delay: nextDelay },
         error: handleErrorResult.error,
       };
 }
Line range hint 351-476: Refactor nested ternary operators for better readability.

The use of nested ternary operators can reduce code readability and maintainability. Refactoring them into if-else statements will enhance clarity.

For example, refactor the code starting at line 423:
 if (!handleErrorResult) {
-  return typeof delay === "undefined"
-    ? { status: "noop" }
-    : { status: "retry", retry: { timestamp: Date.now() + delay, delay } };
+  if (typeof delay === "undefined") {
+    return { status: "noop" };
+  } else {
+    return { status: "retry", retry: { timestamp: Date.now() + delay, delay } };
+  }
 }
Similarly, refactor other nested ternary operations within the #handleError method.
apps/webapp/app/v3/services/completeAttempt.server.ts (1)

293-294: Address the TODO comment regarding default crash behavior

There's a TODO comment indicating uncertainty about defaulting internal errors to "CRASHED" status. Resolving this will clarify the intended error-handling strategy.
internal-packages/database/prisma/schema.prisma (1)
1966-1970: Align comment formatting for consistency.

In the TaskRunAttemptStatus enum, consider matching the comment style used in the TaskRunStatus enum by adding surrounding slashes and blank lines. This ensures consistency and enhances readability.

Apply this diff to align the comment formatting:
+      ///
       /// NON-FINAL
+      ///
       PENDING
       EXECUTING
       PAUSED
+      ///
       /// FINAL
+      ///
       FAILED
       CANCELED
       COMPLETED

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 768036a and 8bb7861.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (20)

apps/webapp/app/components/runs/v3/RunInspector.tsx (3 hunks)
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (2 hunks)
apps/webapp/app/routes/api.v1.runs.$runParam.attempts.ts (1 hunks)
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (1 hunks)
apps/webapp/app/v3/failedTaskRun.server.ts (3 hunks)
apps/webapp/app/v3/handleSocketIo.server.ts (1 hunks)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1 hunks)
apps/webapp/app/v3/services/completeAttempt.server.ts (10 hunks)
apps/webapp/app/v3/services/crashTaskRun.server.ts (4 hunks)
apps/webapp/app/v3/services/createTaskRunAttempt.server.ts (2 hunks)
apps/webapp/app/v3/services/finalizeTaskRun.server.ts (3 hunks)
apps/webapp/app/v3/taskStatus.ts (2 hunks)
apps/webapp/package.json (2 hunks)
internal-packages/database/prisma/schema.prisma (3 hunks)
packages/cli-v3/src/entryPoints/deploy-index-worker.ts (4 hunks)
packages/cli-v3/src/entryPoints/dev-index-worker.ts (2 hunks)
packages/core/src/v3/errors.ts (6 hunks)
packages/core/src/v3/schemas/common.ts (2 hunks)
packages/core/src/v3/utils/ioSerialization.ts (3 hunks)
packages/core/src/v3/workers/taskExecutor.ts (1 hunks)

🧰 Additional context used

🪛 Biome

packages/core/src/v3/errors.ts

[error] 155-189: This case is falling through to the next case.

Add a break or return statement to the end of this case to prevent fallthrough.

(lint/suspicious/noFallthroughSwitchClause)

🔇 Additional comments (52)

packages/cli-v3/src/entryPoints/dev-index-worker.ts (3)

6-6: LGTM: TaskManifest import added

The addition of the TaskManifest type import is appropriate and aligns with the changes made later in the file for improved type checking of task configurations.

124-124: LGTM: Type assertion added to maxDuration logic

The addition of satisfies TaskManifest to the maxDuration logic enhances type safety and consistency. This change aligns well with the similar modification in the retry configuration logic.

Line range hint 1-170: Summary: Improvements in task configuration and type safety

The changes in this file successfully implement the following improvements:

Addition of default retry configuration for tasks

Enhanced type checking using the TaskManifest type

Consistent use of satisfies keyword for type assertions

These modifications align well with the PR objectives of improving error handling and enhancing the robustness of the task management system. The code is well-structured and maintains good practices for type safety and configuration management.

packages/core/src/v3/schemas/common.ts (1)

92-92: LGTM: New error codes enhance error handling capabilities.

The addition of "TASK_PROCESS_SIGSEGV" and "IMPORT_PAYLOAD_ERROR" to the TaskRunInternalError schema aligns well with the PR objectives. These new error codes will improve the system's ability to detect and handle specific error scenarios, particularly segmentation faults and issues with importing large payload packets.

Also applies to: 106-106

apps/webapp/app/v3/services/createTaskRunAttempt.server.ts (1)

Line range hint 1-280: Overall assessment: Changes align well with PR objectives.

The modifications to the CreateTaskRunAttemptService class, particularly the updated call method, effectively enhance the flexibility and control over task run attempts. These changes are consistent with the PR objectives of improving error handling and retry mechanisms.

Key improvements:

The new object parameter structure in the call method enhances code readability and flexibility.

The addition of the startAtZero parameter allows for more nuanced control over initial attempt numbering.

The updated logic for nextAttemptNumber calculation accommodates various retry scenarios.

These enhancements contribute to a more robust and configurable system for managing task run attempts, which should lead to improved error handling and system reliability.
apps/webapp/package.json (1)
167-167: Approve the addition of semver and its type definitions.

The addition of semver and @types/semver is a good practice, especially in a TypeScript project. This suggests an intention to use version comparison functionality, which could be beneficial for error handling or compatibility checks.

Could you please clarify the intended use of the semver library in the context of this PR's objectives? This will help ensure it aligns with the error handling and retry mechanism improvements mentioned in the PR description.

To verify the usage of semver in the codebase, you can run the following script:

This will help us understand how semver is being utilized across the project.

Also applies to: 217-217

✅ Verification successful

Verified the usage of semver aligns with the PR objectives.

Uses of semver found in:

apps/webapp/app/v3/failedTaskRun.server.ts

packages/cli-v3/src/commands/update.ts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Search for semver usage in the codebase

# Look for import statements or require calls for semver
echo "Searching for semver imports or requires:"
rg -i '(import.*from.*semver|require.*semver)' --type ts --type js

# Look for direct usage of semver functions
echo "\nSearching for semver function usage:"
rg -i 'semver\.(valid|satisfies|gt|lt|eq)' --type ts --type js
Length of output: 788
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (2)

10-10: LGTM: Import addition aligns with PR objectives.

The addition of isFailedRunStatus import from the taskStatus module is consistent with the PR's goal of improving error detection and handling. This function will likely be used to determine if a run status indicates a failure, contributing to the unification of logic for handling failed and crashed runs.

Line range hint 1-362: Summary: Enhancements to SpanPresenter improve error detection capabilities.

The changes made to SpanPresenter.server.ts effectively contribute to the PR's objective of improving error handling and detection. By introducing the isFailedRunStatus function and the isError property, the class now provides more detailed and accessible information about run error states. These modifications align well with the overall goal of unifying logic for handling failed and crashed runs across the system.

These enhancements will likely improve the system's ability to detect and respond to errors, potentially leading to more robust error handling and better user feedback. As part of the larger set of changes in this PR, these modifications to SpanPresenter play a crucial role in achieving the stated objectives.

apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (2)

860-860: Alignment with PR objectives: Enhanced error handling and visual clarity

This change directly contributes to the PR's objectives by improving the visual representation of errors in the run timeline. It enhances the user experience by providing more accurate and clear feedback about the run's status, particularly when errors occur.

Line range hint 1-862: Summary: Improved error handling in RunTimelineEvent

The changes in this file successfully enhance the error handling and visual representation of run statuses in the timeline. The modification aligns well with the PR objectives of improving error handling and visual clarity. While the implementation is sound, consider the suggested minor optimizations for improved code maintainability and type safety.
apps/webapp/app/v3/taskStatus.ts (2)
Line range hint 3-116: Overall implementation is well-structured and clear

The categorization of task run and attempt statuses using constants and corresponding type definitions enhances code readability and maintainability. The utility functions provided for status checks are consistent and align with the new status groupings.

56-56: ⚠️ Potential issue

Typo in type definition of FATAL_RUN_STATUSES

The type FATAL_RUN_STATUSES is incorrectly defined using FAILED_RUN_STATUSES instead of FATAL_RUN_STATUSES. This could lead to type inconsistencies and unexpected behavior when using this type elsewhere in the codebase.

Apply this diff to fix the typo:
-export type FATAL_RUN_STATUSES = (typeof FAILED_RUN_STATUSES)[number];
+export type FATAL_RUN_STATUSES = (typeof FATAL_RUN_STATUSES)[number];
Likely invalid or redundant comment.
packages/cli-v3/src/entryPoints/deploy-index-worker.ts (1)

6-6: Importing TaskManifest type ensures type safety

The addition of type TaskManifest to the imports enhances type checking for task-related operations.

apps/webapp/app/v3/services/crashTaskRun.server.ts (11)

9-9: Importing FailedTaskRunRetryHelper is Necessary

The addition of FailedTaskRunRetryHelper import is appropriate for implementing the retry mechanism for failed task runs.

39-40: Enhanced Error Logging for Missing Task Runs

Good job adding error logging when the task run is not found. This will aid in debugging issues related to missing task runs.

45-48: Improved Logging for Non-Crashable Task Runs

Adding detailed error logs when a task run is not in a crashable state enhances traceability and debugging capability.

52-52: Consistent Debug Logging

Logging the commencement of attempt completion helps in tracing the execution flow and is a good practice.

54-67: Proper Implementation of Retry Mechanism

The integration of FailedTaskRunRetryHelper to handle retries is well-structured. The error object within the completion parameter is correctly populated with relevant details.

69-74: Correct Handling of Retry Results

Appropriately checking if the task run was retried and returning early ensures that the logic flows correctly after a retry.

76-78: Conditional Return Based on overrideCompletion

The check for opts.overrideCompletion before proceeding further maintains the integrity of the override process.

80-80: Logging Override of Task Run Completion

Logging when overriding the task run completion provides transparency and aids in debugging.

123-125: Debug Logging for Crashing In-Progress Events

Including the IDs of in-progress events being crashed enhances visibility and facilitates easier tracking of crashed events.

172-177: Setting Tracing Attributes for Better Observability

Annotating the span with taskRunId and attemptId improves observability in distributed tracing systems.

179-193: Ensure Proper Sanitization of Error Logs

When updating taskRunAttempt, the error object includes stackTrace: error.logs. To prevent potential leakage of sensitive information, please ensure that error.logs is appropriately sanitized before being assigned to stackTrace.

If sanitizeError() adequately sanitizes error.logs, no further action is needed. Otherwise, consider sanitizing error.logs prior to passing it to sanitizeError().
apps/webapp/app/v3/services/finalizeTaskRun.server.ts (4)
6-11: Imports updated to handle fatal run statuses

The addition of isFatalRunStatus and FINAL_ATTEMPT_STATUSES imports allows the service to properly detect and handle fatal run statuses.

16-16: Importing socketIo for run cancellation signaling

The import of socketIo is necessary for emitting run cancellation requests when a fatal status is detected.

99-133: Handling fatal run statuses and requesting worker exit

The added code block correctly handles scenarios where the run status is fatal. It logs the error, retrieves the extended run information, and, if appropriate, emits a REQUEST_RUN_CANCELLATION event via socketIo to signal a worker exit.

156-159: Verify that attemptStatus or error is always provided to finalizeAttempt

The early return ensures that finalizeAttempt does not proceed without an attemptStatus or error. Please verify that all calls to finalizeAttempt provide at least one of these parameters to prevent unintended skips in attempt finalization.

✅ Verification successful

Verified: All calls to finalizeAttempt provide either attemptStatus or error as required.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that all calls to `finalizeAttempt` provide either `attemptStatus` or `error`.

# Search for all calls to `finalizeAttempt` and check for provided parameters
rg 'finalizeAttempt$\{[^}]*\}$' -t ts

# Expected: Each call should include `attemptStatus` or `error` in the arguments.
Length of output: 160
packages/core/src/v3/utils/ioSerialization.ts (2)

7-8: Imports are appropriate and correctly included

The new imports for zodfetch and z are necessary for the updated functionality using zodfetch for fetching and Zod for validation.

119-126: Verify that the retry options align with system requirements

Ensure that the configured retry options in ioRetryOptions are appropriate for the expected network conditions and service-level agreements (SLAs). Specifically, confirm that the minimum and maximum timeouts, maximum attempts, backoff factor, and randomization settings are suitable for handling transient network failures without causing excessive delay or load on the system.

packages/core/src/v3/workers/taskExecutor.ts (1)

96-114: Proper error handling for payload import failures.

The added try-catch block around conditionallyImportPacket correctly captures exceptions and returns a structured error response, enhancing the robustness of the payload import process.

apps/webapp/app/v3/services/completeAttempt.server.ts (15)

11-11: Importing shouldRetryError for enhanced error handling

The addition of shouldRetryError improves error evaluation for retry logic.

25-25: Importing status utilities from taskStatus

Including FAILED_RUN_STATUSES, isFinalAttemptStatus, and isFinalRunStatus enhances the status handling mechanisms.

28-28: Importing FailedTaskRunRetryHelper for retry management

The introduction of FailedTaskRunRetryHelper assists in managing retries for failed task runs effectively.

44-53: Adding optional parameters isSystemFailure and isCrash

Incorporating isSystemFailure and isCrash as optional parameters allows for more granular handling of failure scenarios.

119-120: Passing failure indicators to #completeAttemptFailed

Ensuring isSystemFailure and isCrash are forwarded maintains consistency in failure handling across methods.

181-191: Updating #completeAttemptFailed to accept new parameters

Modifying #completeAttemptFailed to include isSystemFailure and isCrash aligns it with the updated error-handling logic.

227-236: Enhancing retry determination logic

Utilizing FailedTaskRunRetryHelper.getExecutionRetry as a fallback ensures retries are appropriately managed even when completion.retry is undefined.

238-251: Implementing conditional retry logic

The conditions for retrying failed attempts are well-defined, preventing unnecessary retries and adhering to maximum attempt limitations.

253-270: Completing task run events on non-retry failures

Properly finalizing task run events and logging exceptions ensures accurate tracking of failures.

281-298: Setting task run status based on failure type

The logic for determining the status variable allows for precise categorization of failure types, improving error reporting.

311-358: Handling in-progress events based on final status

The switch-case structure effectively manages in-progress events according to whether the status is "CRASHED" or "SYSTEM_FAILURE," ensuring proper cleanup.

364-372: Updating #enqueueReattempt parameters

Adding executionRetry to the method enhances its ability to schedule retries accurately.

424-501: Creating the #retryAttempt method for retry logic

Encapsulating retry functionality within #retryAttempt improves code organization and readability.

503-557: Refining #retryAttemptWithCheckpoint

Adjustments to accommodate new retry mechanisms while maintaining checkpoint functionality are well-implemented.

592-596: Selecting sdkVersion in the findAttempt function

Including sdkVersion prepares for future enhancements that may require SDK version information.

apps/webapp/app/components/runs/v3/RunInspector.tsx (3)

43-43: Importing run status helper functions

Great addition of isFailedRunStatus and isFinalRunStatus imports to enhance run status evaluation.

482-482: Defining isError improves readability

Introducing the isError constant simplifies checking whether the run status indicates a failure, enhancing code clarity.

539-539: Conditional rendering based on error state

Using isError to set the state prop ensures that the timeline accurately reflects error statuses by displaying the correct visual state.
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)
512-515: ⚠️ Potential issue

Possible incorrect parameter name in CreateTaskRunAttemptService.call()

It appears that the CreateTaskRunAttemptService.call() method has been updated to accept a parameter startAtZero. In this code segment, setToExecuting: false is passed instead. Please verify if setToExecuting should be replaced with startAtZero to match the updated method signature.

Apply this diff to update the parameter name:
-                  await service.call({
-                    runId: lockedTaskRun.id,
-                    setToExecuting: false,
-                  });
+                  await service.call({
+                    runId: lockedTaskRun.id,
+                    startAtZero: false,
+                  });
Run the following script to find all usages of CreateTaskRunAttemptService.call() and check for consistent parameter names:
internal-packages/database/prisma/schema.prisma (2)

1804-1805: Good use of comments to categorize enum statuses.

The addition of NON-FINAL STATUSES comments enhances code readability by clearly grouping the non-terminal task statuses.

1828-1831: Clear separation of final statuses aids understanding.

Including FINAL STATUSES comments provides a distinct separation between non-final and final statuses, improving maintainability and clarity.

packages/cli-v3/src/entryPoints/deploy-index-worker.ts

apps/webapp/app/v3/services/finalizeTaskRun.server.ts

apps/webapp/app/v3/failedTaskRun.server.ts

packages/core/src/v3/utils/ioSerialization.ts

packages/core/src/v3/errors.ts

pkg-pr-new · 2024-10-22T10:01:44Z

trigger.dev

pnpm add https://pkg.pr.new/triggerdotdev/trigger.dev@1424

@trigger.dev/build

pnpm add https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/build@1424

@trigger.dev/core

pnpm add https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/core@1424

@trigger.dev/sdk

pnpm add https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/sdk@1424

@trigger.dev/react-hooks

pnpm add https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/react-hooks@1424

commit: d7b246e

packages/core/src/v3/errors.ts

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (6)

.changeset/eight-turtles-itch.md (1)
1-9: Enhance changeset descriptions with more details about impact and benefits.

The version bumps are appropriate for these types of changes. However, the change descriptions could be more detailed to better communicate the impact and benefits to users.

Consider expanding the descriptions like this:
- Include retries.default in task retry config when indexing
- New helpers for internal error retry mechanics
- Detection for segfaults and ffmpeg OOM errors
- Retries for packet import and export
+ Include retries.default in task retry config when indexing, providing more flexible retry strategies
+ Add new helpers for internal error retry mechanics, improving system reliability
+ Implement detection for segfaults and ffmpeg OOM errors, preventing system crashes
+ Add automatic retries for packet import/export operations, enhancing data transfer reliability
apps/webapp/app/v3/services/crashTaskRun.server.ts (1)
124-125: Consider using structured logging for event IDs.

While the logging is informative, consider structuring the event IDs for better log parsing.
-    logger.debug("[CrashTaskRunService] Crashing in-progress events", {
-      inProgressEvents: inProgressEvents.map((event) => event.id),
+    logger.debug("[CrashTaskRunService] Crashing in-progress events", {
+      eventCount: inProgressEvents.length,
+      eventIds: inProgressEvents.map((event) => event.id),
+      runId
     });
packages/core/src/v3/apiClient/core.ts (2)
Line range hint 232-251: Review empty catch block for ValidationError.

The empty catch block for ValidationError silently ignores validation issues, which could make debugging harder and mask real problems. This doesn't align with the PR's goal of improving error handling.

Consider one of these approaches:
     if (error instanceof ApiError) {
       throw error;
     }
 
     if (error instanceof ValidationError) {
+      // Option 1: Log and rethrow
+      console.error('Validation error:', error);
+      throw error;
+      
+      // Option 2: Convert to ApiError
+      throw new ApiSchemaValidationError({
+        status: 400,
+        cause: error,
+        message: error.message,
+        headers: {},
+      });
     }
Line range hint 232-277: Consider enhancing error detection for specific failure cases.

While the changes improve JSON parsing safety, there might be opportunities to further align with the PR objectives by adding specific error detection for ffmpeg-related issues (OOM errors and segfaults) mentioned in the PR description.

Consider:

Adding specific error detection in the response parsing:
function isFFmpegError(error: unknown): boolean {
  return error instanceof Error && (
    error.message.includes("out of memory") ||
    error.message.includes("segmentation fault")
  );
}
Updating the retry logic to handle these cases:
     if (error instanceof ApiError) {
       throw error;
     }
 
+    if (isFFmpegError(error)) {
+      // Consider different retry strategy for FFmpeg errors
+      const retryDelay = calculateFFmpegRetryDelay(attempt);
+      if (retryDelay) {
+        await waitForRetry(url, attempt + 1, retryDelay, options, requestInit);
+        return await _doZodFetchWithRetries(schema, url, requestInit, options, attempt + 1);
+      }
+    }
packages/core/src/v3/workers/taskExecutor.ts (2)
96-114: LGTM! Consider enhancing error message readability.

The error handling implementation for payload import is robust and well-structured. It properly captures and formats errors while maintaining observability through span recording.

Consider using template literals for better readability of the error message:
-                  inputError instanceof Error
-                    ? `${inputError.name}: ${inputError.message}`
-                    : typeof inputError === "string"
-                    ? inputError
-                    : undefined,
+                  inputError instanceof Error
+                    ? `${inputError.name}: ${inputError.message}`
+                    : typeof inputError === "string"
+                    ? `Unknown error: ${inputError}`
+                    : "Unknown error occurred",
Line range hint 154-170: Maintain consistent error message formatting across error handlers.

The error handling for output stringification is well-implemented, but the error message formatting differs from the payload import error handler. Consider maintaining consistency for better error tracking and debugging.

Apply this change to match the payload import error message format:
-                    outputError instanceof Error
-                      ? outputError.message
-                      : typeof outputError === "string"
-                      ? outputError
-                      : undefined,
+                    outputError instanceof Error
+                      ? `${outputError.name}: ${outputError.message}`
+                      : typeof outputError === "string"
+                      ? outputError
+                      : undefined,

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8bb7861 and 25a7578.

📒 Files selected for processing (7)

.changeset/eight-turtles-itch.md (1 hunks)
apps/webapp/app/v3/failedTaskRun.server.ts (3 hunks)
apps/webapp/app/v3/services/crashTaskRun.server.ts (4 hunks)
packages/core/src/v3/apiClient/core.ts (2 hunks)
packages/core/src/v3/errors.ts (6 hunks)
packages/core/src/v3/schemas/common.ts (1 hunks)
packages/core/src/v3/workers/taskExecutor.ts (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/core/src/v3/schemas/common.ts

🧰 Additional context used

📓 Learnings (1)

apps/webapp/app/v3/failedTaskRun.server.ts (1)

Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1424
File: apps/webapp/app/v3/failedTaskRun.server.ts:136-139
Timestamp: 2024-10-22T10:34:42.602Z
Learning: In the `apps/webapp` project, it's acceptable to use private class methods with `#` syntax in TypeScript code.

🪛 Biome

packages/core/src/v3/errors.ts

[error] 155-189: This case is falling through to the next case.

Add a break or return statement to the end of this case to prevent fallthrough.

(lint/suspicious/noFallthroughSwitchClause)

🔇 Additional comments (10)

apps/webapp/app/v3/services/crashTaskRun.server.ts (3)
39-39: LGTM: Improved error logging with better traceability.

The addition of the "[CrashTaskRunService]" prefix and enhanced error details in log messages will make debugging easier.

Also applies to: 45-48

77-82: LGTM: Clean completion override logic.

The completion override logic is well-structured with appropriate guards and logging.

173-196: ⚠️ Potential issue

Implement container cleanup after fatal errors.

According to the PR objectives, container cleanup should be performed after fatal errors. Consider adding container cleanup logic in the attempt failure handling.

Let's verify if container cleanup is implemented elsewhere:

If container cleanup is not implemented elsewhere, consider adding it:
 async #failAttempt(
   attempt: TaskRunAttempt,
   run: TaskRun,
   failedAt: Date,
   environment: AuthenticatedEnvironment,
   error: {
     reason: string;
     logs?: string;
     code?: TaskRunInternalError["code"];
   }
 ) {
   return await this.traceWithEnv(
     "[CrashTaskRunService] failAttempt()",
     environment,
     async (span) => {
       span.setAttribute("taskRunId", run.id);
       span.setAttribute("attemptId", attempt.id);

+      // Cleanup container resources if available
+      if (attempt.backgroundWorker?.containerId) {
+        try {
+          await this.cleanupContainer(attempt.backgroundWorker.containerId);
+        } catch (error) {
+          logger.error("[CrashTaskRunService] Failed to cleanup container", {
+            containerId: attempt.backgroundWorker.containerId,
+            error,
+          });
+        }
+      }

       await this._prisma.taskRunAttempt.update({
         where: {
           id: attempt.id,
         },
         data: {
           status: "FAILED",
           completedAt: failedAt,
           error: sanitizeError({
             type: "INTERNAL_ERROR",
             code: error.code ?? "TASK_RUN_CRASHED",
             message: error.reason,
             stackTrace: error.logs,
           }),
         },
       });
     }
   );
 }
apps/webapp/app/v3/failedTaskRun.server.ts (5)
1-30: LGTM! Well-structured imports and type definitions.

The imports are properly organized, and the type definitions are clear and well-documented. The Prisma include configuration for attempts is properly structured with clear ordering.

Line range hint 32-74: LGTM! Robust error handling with proper logging.

The service effectively handles failed task runs with comprehensive error checking and proper logging. The delegation of retry logic to FailedTaskRunRetryHelper promotes good separation of concerns.

81-134: LGTM! Well-structured retry handling with proper error management.

The call method effectively orchestrates the retry process with appropriate error handling and logging.

190-280: ⚠️ Potential issue

Potential issue with delay validation.

The condition if (!delay) on line 259 could incorrectly handle a delay of 0 (immediate retry). Consider using explicit null check.

Apply this diff:
- if (!delay) {
+ if (delay === undefined) {
Likely invalid or redundant comment.

136-188: Consider adding error recovery mechanism for attempt creation failures.

While the error handling is present, the method silently returns undefined on attempt creation failure. Consider implementing a fallback mechanism or retrying the attempt creation.
packages/core/src/v3/errors.ts (2)

449-465: Previous regex refactor suggestion still applies

182-184: Address TODO comment regarding OOM error handling

The TODO comment raises an important concern about retrying tasks that exit with non-zero codes, as these could be OOM errors. Retrying OOM errors without addressing the underlying memory issues could lead to resource exhaustion.

Let's analyze the error patterns to help make an informed decision:

Would you like me to help implement memory usage detection logic to better distinguish between OOM and other non-zero exit scenarios?

apps/webapp/app/v3/services/crashTaskRun.server.ts

packages/core/src/v3/apiClient/core.ts

packages/core/src/v3/errors.ts

…rrors

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)

apps/webapp/app/v3/failedTaskRun.server.ts (1)
Line range hint 38-53: Consider adding type narrowing for the taskRun lookup.

While the error handling is good, we could improve type safety by adding a type guard after the findFirst operation.
 const taskRun = await this._prisma.taskRun.findFirst({
   where: {
     friendlyId: isFriendlyId ? anyRunId : undefined,
     id: !isFriendlyId ? anyRunId : undefined,
   },
+  // Add select to specify exactly what we need
+  select: {
+    id: true,
+    status: true,
+  },
 });
packages/core/src/v3/errors.ts (1)
Line range hint 413-446: Consider standardizing error message structure.

For consistency with other error messages, consider moving the conditional phrase to the end:
-      "Your task crashed with a segmentation fault (SIGSEGV). Most likely there's a bug in a package or binary you're using. If this keeps happening and you're unsure why, please get in touch.",
+      "Your task crashed with a segmentation fault (SIGSEGV). Most likely there's a bug in a package or binary you're using. Please get in touch if this keeps happening and you're unsure why.",
internal-packages/database/prisma/schema.prisma (1)
1967-1971: Consider enhancing enum documentation for consistency.

The status categorization is good, but consider adding descriptive comments for each status similar to TaskRunStatus enum for better consistency and clarity.

Add descriptive comments for each status:
 /// NON-FINAL
+/// Task is waiting to be executed
 PENDING
+/// Task is currently being executed
 EXECUTING
+/// Task has been paused and can be resumed
 PAUSED
 /// FINAL
+/// Task has failed to complete due to an error
 FAILED
+/// Task has been canceled by the user
 CANCELED
+/// Task has completed successfully
 COMPLETED

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 25a7578 and d7b246e.

📒 Files selected for processing (3)

apps/webapp/app/v3/failedTaskRun.server.ts (3 hunks)
internal-packages/database/prisma/schema.prisma (3 hunks)
packages/core/src/v3/errors.ts (6 hunks)

🧰 Additional context used

📓 Learnings (1)

apps/webapp/app/v3/failedTaskRun.server.ts (1)

Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1424
File: apps/webapp/app/v3/failedTaskRun.server.ts:136-139
Timestamp: 2024-10-22T10:34:42.602Z
Learning: In the `apps/webapp` project, it's acceptable to use private class methods with `#` syntax in TypeScript code.

🪛 Biome

packages/core/src/v3/errors.ts

[error] 155-188: This case is falling through to the next case.

Add a break or return statement to the end of this case to prevent fallthrough.

(lint/suspicious/noFallthroughSwitchClause)

🔇 Additional comments (9)

apps/webapp/app/v3/failedTaskRun.server.ts (5)
1-30: LGTM! Well-structured imports and type definitions.

The imports are properly scoped, and the type definitions are clear and well-organized. The includeAttempts constant provides a clean structure for retrieving attempts with proper ordering.

76-79: Referring to previous review comment about explicit interfaces.

81-134: LGTM! Well-structured retry handling implementation.

The class provides a clean implementation of retry logic with proper error handling and logging.

257-266: ⚠️ Potential issue

Improve delay validation to handle zero delays correctly.

The current condition if (!delay) will treat a delay of 0 (immediate retry) as if there were no more retries.
- if (!delay) {
+ if (delay === undefined || delay === null) {
Likely invalid or redundant comment.

283-284: ⚠️ Potential issue

Update the TODO comment with the correct version.

The TODO comment indicates that the version needs to be updated. This is important as it affects the retry configuration behavior.
packages/core/src/v3/errors.ts (3)

153-202: LGTM! Well-structured error retry logic.

The implementation provides a clear and exhaustive categorization of errors into retryable and non-retryable cases. The use of assertExhaustive ensures type safety by catching any missing error codes at compile time.

🧰 Tools

🪛 Biome

[error] 155-188: This case is falling through to the next case.

Add a break or return statement to the end of this case to prevent fallthrough.

(lint/suspicious/noFallthroughSwitchClause)

448-464: Previous regex suggestion is still applicable.

Line range hint 471-595: Previous suggestion to extract common signal handling logic is still applicable.

internal-packages/database/prisma/schema.prisma (1)

Line range hint 1805-1832: Well-structured enum with clear status categorization!

The separation of statuses into "NON-FINAL" and "FINAL" categories with clear documentation improves code maintainability. The new statuses (RETRYING_AFTER_FAILURE, CRASHED, etc.) effectively support the enhanced error handling and retry mechanisms.

apps/webapp/app/v3/failedTaskRun.server.ts

nicktrn and others added 30 commits October 21, 2024 14:40

refactor finalize run service

7bfaf74

refactor complete attempt service

e81d5bc

remove separate graceful exit handling

55cb6a9

refactor task status helpers

8699d6b

clearly separate statuses in prisma schema

0c37df7

all non-final statuses should be failable

f710326

new import payload error code

0b39a4c

store default retry config if none set on task

f92d82f

failed run service now respects retries

9350770

fix merged task retry config indexing

9f6696d

some errors should never be retried

aee555a

finalize run service takes care of acks now

008de3b

execution payload helper now with single object arg

fc59ce2

internal error code enum export

ccda672

unify failed and crashed run retries

78ef834

Prevent uncaught socket ack exceptions (#1415)

cba5885

* catch all the remaining socket acks that could possibly throw * wrap the remaining handlers in try catch

Added a new Context page for the docs (#1416)

3eb2f97

* Added a new context page with task context properties * Removed code comments * Added more crosslinks

Fix updating many environment variables at once (#1413)

0839824

Move code example to the side menu

43085ff

New docs example for creating a HN email summary

8e4acc4

doc: add instructions to create new reference project and run it loca…

45a34f9

…lly (#1417) * doc: add instructions to create new reference project and run it locally * doc: Add instruction for running tunnel * minor language improvement

chore: Update version for release (#1410)

cb50c67

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Release 3.0.13

339f4ac

capture ffmpeg oom errors

0921733

respect maxAttempts=1 when failing before first attempt creation

2e8bc83

request worker exit on fatal errors

53551e0

fix error code merge

8111a9a

add new error code to should retry

c5813c4

nicktrn added 9 commits October 22, 2024 07:42

pretty segfault errors

d3719b1

pretty internal errors for attempt spans

5df1807

decrease oom false positives

757cc21

fix timeline event color for failed runs

38f4176

auto-retry packet import and export

f3629af

add sdk version check and complete event while completing attempt

2ae2a7c

all internal errors become crashes by default

eed71f9

Merge remote-tracking branch 'origin/main' into feat/auto-reattempt

78a3d43

use pretty error helpers exclusively

8bb7861

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

nicktrn commented Oct 22, 2024

View reviewed changes

packages/core/src/v3/errors.ts Outdated Show resolved Hide resolved

nicktrn added 7 commits October 22, 2024 12:12

error to debug log

a2293b6

zodfetch fixes

ab86276

rename import payload to task input error

56095a8

fix true non-zero exit error display

0e1b21c

fix retry config parsing

d1c38de

correctly mark crashes as crashed

cd8dffa

add changeset

25a7578

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

apps/webapp/app/v3/services/crashTaskRun.server.ts Show resolved Hide resolved

packages/core/src/v3/apiClient/core.ts Show resolved Hide resolved

packages/core/src/v3/errors.ts Show resolved Hide resolved

nicktrn added 3 commits October 24, 2024 10:52

Merge remote-tracking branch 'origin/main' into feat/retry-internal-e…

6e1d4db

…rrors

remove non-zero exit comment

896edf1

pretend we don't support default default retry configs yet

d7b246e

nicktrn merged commit 212f853 into main Oct 24, 2024
7 of 8 checks passed

nicktrn deleted the feat/retry-internal-errors branch October 24, 2024 10:12

github-actions bot mentioned this pull request Oct 24, 2024

chore: Update version for release #1423

Merged

coderabbitai bot reviewed Oct 24, 2024

View reviewed changes

apps/webapp/app/v3/failedTaskRun.server.ts Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically reattempt after internal errors #1424

Automatically reattempt after internal errors #1424

nicktrn commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Oct 22, 2024 •

edited

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

pkg-pr-new bot commented Oct 22, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

Automatically reattempt after internal errors #1424

Automatically reattempt after internal errors #1424

Conversation

nicktrn commented Oct 22, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Summary by CodeRabbit

changeset-bot bot commented Oct 22, 2024 • edited Loading

🦋 Changeset detected

coderabbitai bot commented Oct 22, 2024 • edited Loading

Walkthrough

Changes

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

pkg-pr-new bot commented Oct 22, 2024 • edited Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

nicktrn commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Oct 22, 2024 •

edited

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

pkg-pr-new bot commented Oct 22, 2024 •

edited

Loading